Presto and AWS S3

What is AWS S3?

Amazon Simple Storage Service (Amazon S3) is storage for the internet. Amazon S3 is used to store and retrieve any amount of data at any time, from anywhere on the web, by using the web interface of the AWS Management Console.

What is Presto?

PrestoDB is a federated SQL query engine for data engineers and analysts to run interactive, ad hoc analytics on large amounts of data, which continues to grow exponentially across a wide range of data lakes and databases. Many organizations are adopting Presto as a single engine to query against all available data sources. Data platform teams are increasingly using Presto as the de facto SQL query engine to run analytics across data sources in-place. This means that Presto can query data where it is stored, without needing to move data into a separate analytics system. Query execution runs in parallel over a pure memory-based architecture, with most results returning in seconds. 

Why use Presto with AWS S3?

Analysts get better performance at a lower cost by using S3 with Presto, as users can scale their workloads quickly and automatically. Presto allows users to quickly query both unstructured and structured data. Presto is an ideal workload in the cloud because the cloud provides performance, scalability, reliability, availability, and massive economies of scale. You can launch a Presto cluster in minutes, without needing to worry about node provisioning, cluster setup, Presto configuration, or cluster tuning.

Presto executes queries over data sets that are provided by plugins known as Connectors. Integrating Presto with S3 provides users with several features:

  • Presto, running on Amazon EMR, allows developers and analysts to easily run interactive SQL queries that directly access data stored in Amazon S3 for data-exploration, ad-hoc analysis and reporting.
  • The Hive connector allows Presto to query data stored in S3-compatible engines and registered in a Hive Metastore (HMS). 
  • Data transfer between a Presto cluster and S3 is fully parallelized.
  • Presto can be easily deployed using the AWS Serverless platform, with no servers, virtual machines, or clusters to set up, manage or tune. 

Since Presto is based on ANSI SQL, it’s very straightforward to start using it. The Presto connector architecture enables the federated access of almost any data source, whether a database, data lake or other data system. Presto can start from one node and scale to thousands. With Presto, users can use SQL to run ad hoc queries whenever you want, wherever your data resides. Presto allows users to query data where it’s stored so they don’t have to ETL data into a separate system. With an Amazon S3 connector, platform teams can simply point to their data on Amazon S3, define the schema, and start querying using the built-in query editor, or with their existing Business Intelligence (BI) tools. With Presto and S3, you can mine the treasures in your data quickly, and use your data to acquire new insights for your business and customers.

For users who are ready to use Presto to query their AWS S3 but don’t want to worry about the complexities or overhead of managing Presto, you can use a managed service like Ahana Cloud. Ahana Cloud gives you the power of Presto without having to get under the hood. It’s a managed service for AWS and has out-of-the-box integrations with AWS S3, in addition to AWS Glue and Hive Metastore (HMS).

You can try Ahana Cloud out free for 14 days, sign up and get started today.