Ahana Cloud for Presto is a SaaS Managed Service which speeds up the performance of leading business intelligence, data visualization, and SQL tools like Trifacta.
Interactive, ad hoc queries and faster data exploration for Trifacta
What is Trifacta? Trifacta is a cloud-based data engineering front end tool that is used by data analysts and data scientists to leverage an organization’s data. It enables data collection and consolidation, preparation, data transformation, and creation of data pipelines to be performed easily. These data pipelines can then be used for different use cases like business intelligence (BI), machine learning, and data analytics.
Trifacta is a cloud-native solution meant to make it easy to use data in different formats and structures for tasks like data analysis and machine learning. This is achieved by simplifying the tasks of data cleaning, data exploration, data preparation, data validation, and creating data pipelines since the user can use a visual interface. It is supported on different cloud platforms including AWS, Google Cloud, and Microsoft Azure.
Trifacta is used to create data engineering pipelines for data exploration in a cloud environment. One way Trifacta is used is via direct connections to individual supported data stores. The other way is to connect Trifacta to a distributed query engine like Presto, to enable higher performance, higher concurrent workloads and instant, seamless access to multiple data sources.
What Is Presto?
Presto is an in-memory distributed query engine with a connector-based architecture to disparate data sources like S3 cloud storage, relational, and NoSQL databases. It was developed by Facebook to enable them with lightning-fast query responses from their HDFS data lake / warehouse. Since then, it has been adopted by other hundreds of other companies including Uber, Twitter, Amazon, and Alibaba.
A Presto cluster consists of a single coordinator and several worker nodes. The worker nodes are responsible for connecting to various sources and transparently carrying out query processing in a distributed and parallel manner. The computational power of a cluster can thus be increased by adding the number of worker nodes. This has made it an efficient choice for organizations with different data formats and sources and/or a large amount of data to process.
Faster Queries and Unified Access to more Data using Trifacta and PrestoDB
As organizations become more data driven, both end-user computing and data access requirements are increasing. The solutions therefore must have high performance and be scalable to meet the demands placed on today’s data platform teams, who must respond quickly. This calls for the adoption of an open, flexible, distributed architecture. Combining Trifacta and Presto enables organizations to create a highly scalable, distributed, and modern data engineering platform.
A typical architecture consists of Trifacta connected to a presto cluster with one or more connected data sources. Presto handles the data access and in-memory processing of queries. Trifacta handles the visualization, reporting, and data wrangling tasks. This allows the presto cluster to be scaled by adding or removing processing nodes to meet the requirements of the Trifacta users. Integrating them offers other benefits such as data federation, fast query processing, and being able to have different clusters that can be optimized to best meet the needs of data engineering workloads.
Ahana Cloud is the cloud-native SaaS managed service for Presto, see how you can turbocharge Trifacta in only 30 minutes!