PrestoDB on AWS
What is PrestoDB on AWS?
Tip: If you are looking to better understand PrestoDB on AWS then check out the free, downloadable ebook, Learning and Operating Presto. This ebook will breakdown what Presto is, how it started, and best use cases.
To tackle this common question, what is PrestoDB on AWS, let’s first define Presto. PrestoDB is an open-source distributed SQL query engine for running interactive analytic queries against all types of data sources. Presto was originally developed by Facebook and later donated to the Linux Foundation’s Presto Foundation. It was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.
Presto enables self-service ad-hoc analytics for its users on large amounts of data. With Presto, you can query data where it lives. This is including Hive, Amazon S3, Hadoop, Cassandra, relational databases, NoSQL databases, or even proprietary data stores. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization.
AWS and Presto is a powerful combination. If you want to run PrestoDB on AWS it’s easy to spin up a managed Presto cluster. This can be done either through the Amazon Management Console, the AWS CLI, or the Amazon EMR API. It is not too difficult to run AWS Presto CLI EMR.
You can also give Ahana Cloud a try. Ahana is a managed service for Presto that takes care of the devops for you and provides everything you need to build your SQL Data Lakehouse using Presto.
Running Presto on AWS gives you the flexibility, scalability, performance, and cost-effective features of the cloud while allowing you to take advantage of Presto’s distributed query engine.
How does PrestoDB on AWS Work?
This is another very common question. The quickest answer is that PrestoDB is the compute engine on top of the data storage of your SQL Data Lakehouse. In this case, the storage is AWS S3. See the image below for an overview.
There are some AWS services that work with PrestoDB on AWS, like Amazon EMR and Amazon Athena. Amazon EMR and Amazon Athena are the best Amazon services to deploy Presto in the cloud. They are managed services that do the integration, testing, setup, configuration, and cluster tuning for you. Amazon Athena Presto and EMR are widely used, but both come with some challenges, such as price performance and cost.
There are some differences when it comes to EMR Presto vs Athena. AWS EMR enables you to provision as many compute instances as you want, and within minutes. Amazon Athena lets you deploy Presto using the AWS Serverless platform, with no servers, virtual machines, or clusters to setup, manage, or tune.
Many Amazon Athena users run into issues, however, when it comes to scale and concurrent queries. Amazon Athena vs Presto is a common query and many users look at using a service like Athena or PrestoDB. Learn more about those challenges and why they’re moving to Ahana Cloud, SaaS for PrestoDB on AWS.
To get started with Presto for your SQL Data Lakehouse on AWS quickly, check out the services from Ahana Cloud. Ahana has two versions of their solution: a Full-Edition and a Free-Forever Community Edition. Each option has components of the SQL Lakehouse included, as well as support from Ahana. Explore Ahana’s managed service for PrestoDB.
Presto was originally designed to run interactive queries against data warehouses, but now it has evolved into a unified SQL engine on top of open data lake analytics for both interactive and batch workloads.
Both AWS Athena and Ahana Cloud are based on the popular open-source Presto project. The biggest difference between the two is that Athena is a serverless architecture while Ahana Cloud is a managed service for Presto servers.
In this blog, we discuss AWS Athena vs Presto and some of the reasons why you might choose to deploy PrestoDB on your own instead of using the AWS Athena service, like AWS pricing.