Presto and Amazon Redshift
What is Amazon Redshift?
Amazon Redshift is a cloud data warehouse application service used by data analysts or data warehouse engineers for analyzing data using standard SQL and your existing Business Intelligence (BI) tools. Users can start with just a few hundred gigabytes of data and scale to a petabyte or more.
What is Presto?
Presto is a federated SQL query engine for data engineers and analysts to run interactive, ad hoc analytics on large amounts of data, which continues to grow exponentially across a wide range of data lakes and databases. Many organizations are adopting Presto as a single engine to query against all available data sources. Data platform teams are increasingly using Presto as the de facto SQL query engine to run analytics across data sources in-place. This means that Presto can query data where it is stored, without needing to move data into a separate analytics system. Query execution runs in parallel over a pure memory-based architecture, with most results returning in seconds.
Why Presto and Amazon Redshift?
Analysts get better performance at a lower cost by using the Presto Redshift stack, as users can scale their workloads quickly and automatically. Presto allows users to quickly query both unstructured and structured data. Presto is an ideal workload in the cloud because the cloud provides performance, scalability, reliability, availability, and massive economies of scale. You can launch a Presto cluster in minutes, without needing to worry about node provisioning, cluster setup, Presto configuration, or cluster tuning.
Presto executes queries over data sets that are provided by plugins known as Connectors. Integrating Presto with Redshift provides users with new capabilities:
- Presto reads data directly from HDFS, so you don’t need to perform ETL on the data. Presto has also been extended to operate over different kinds of data sources including traditional relational databases and other data sources such as Redshift.
- The Redshift connector allows users to query and create tables in an external Amazon Redshift cluster. Users can join data between different systems like Redshift and Hive, or between two different Redshift clusters. Since Presto on Amazon EMR supports spot instances, the total cost of running a data platform is lower.
- Presto can reduce query execution time. Presto provides the ability to run queries in seconds instead of hours, and analysts can iterate quickly on innovative hypotheses with the interactive exploration of any dataset, residing anywhere.
Since Presto is based on ANSI SQL, it’s very straightforward to start using it. The Presto connector architecture enables the federated access of almost any data source, whether a database, data lake, or other data system. Presto can start from one node and scale to thousands. With Presto, users can use SQL to run ad hoc queries whenever you want, wherever your data resides. Presto allows users to query data where it’s stored so you don’t have to ETL data into a separate system. With a Redshift connector, platform teams can quickly provide access to datasets that analysts are interested in while being able to scale and meet the growing requirements for analytics. With Presto and Redshift, you can mine the treasures in your data quickly, and use your data to acquire new insights for your business and customers.