Welcome to our blog series on comparing AWS Athena, a serverless Presto service, to open source PrestoDB. In this series we’ll discuss Amazon’s Athena service versus PrestoDB and some of the reasons why you might choose to deploy PrestoDB on your own instead of using the AWS Athena service. We hope you find this series helpful.
AWS Athena is an interactive query service built on PrestoDB that developers use to query data stored in Amazon S3 using standard SQL. It has a serverless architecture and Athena users pay per query (it’s priced at $5 per terabyte scanned). Some of the common Amazon Athena limits are technical limitations that include query limits, concurrent queries limits, and partition limits. AWS Athena limits performance, as it runs slowly and increases operational costs. Plus, AWS Athena is built on an old version of PrestoDB and only supports a subset of PrestoDB features.
An overview on AWS Athena limits
AWS Athena query limits can cause problems, and many data engineering teams have spent hours trying to diagnose them. Some limits are hard, while some are soft quotas that you can request AWS to increase. One big limitation is around Athena’s limitations on queries: Athena users can only submit one query at a time and can only run up to five queries simultaneously for each account by default.
AWS Athena query limits
AWS Athena Data Definition Language (DDL, like CREATE TABLE statements) and Data Manipulation Language (DML, like DELETE and INSERT) have the following limits:
1. Athena DDL max query limit: 20 DDL active queries .
2. Athena DDL query timeout limit: The Athena DDL query timeout is 600 minutes.
3. Athena DML query limit: Athena only allows you to have 25 DML queries (running and queued queries) in the US East and 20 DML queries in all other Regions by default.
4. Athena DML query timeout limit: The Athena DML query timeout limit is 30 minutes.
5. Athena query string length limit: The Athena query string hard limit is 262,144 bytes.
Learn More About Athena Query Limits
We have put together a deep dive into Athena Query limits in Part 2 of this series, which you can read by following the link below:
AWS Athena partition limits
- Athena’s users can use AWS Glue, a data catalog and ETL service. Athena’s partition limit is 20,000 per table and Glue’s limit is 1,000,000 partitions per table.
- A Create Table As (CTAS) or INSERT INTO query can only create up to 100 partitions in a destination table. To work around this limitation you must manually chop up your data by running a series of INSERT INTOs that insert up to 100 partitions each.
Athena database limits
AWS Athena also has the following S3 bucket limitations:
1. Amazon S3 bucket limit is 100* buckets per account by default – you can request to increase it up to 1,000 S3 buckets per account.
3. Athena restricts each account to 100* databases, and databases cannot include over 100* tables.
*Note, recently Athena has increased this to 10K databases per account and 200K tables per database.
AWS Athena open-source alternative
Deploying your own PrestoDB cluster
An AWS Athena alternative is deploying your own PrestoDB cluster. AWS Athena is built on an old version of PrestoDB – in fact, it’s about 60 releases behind the PrestoDB project. Newer features are likely to be missing from Athena (and in fact it only supports a subset of PrestoDB features to begin with).
Deploying and managing PrestoDB on your own means you won’t have AWS Athena limitations such as the athena concurrent queries limit, concurrent queries limits, database limits, table limits, partitions limits, etc. Plus you’ll get the very latest version of Presto. PrestoDB is an open source project hosted by The Linux Foundation’s Presto Foundation. It has a transparent, open, and neutral community.
If deploying and managing PrestoDB on your own is not an option (time, resources, expertise, etc.), Ahana can help.
Ahana Cloud for Presto: A fully managed service
Ahana Cloud for Presto is a fully managed Presto cloud service without the limits of AWS Athena.
You use AWS to query and analyze AWS data lakes stored in Amazon S3, and many other data sources, using the latest version of PrestoDB. Ahana is cloud-native and runs on Amazon Elastic Kubernetes (EKS), helping you to reduce operational costs with its automated cluster management, speed and ease of use. Ahana is a SaaS offering via a beautiful and easy to use console UI. Anyone at any knowledge level can use it with ease, there is zero configuration effort and no configuration files to manage. Many companies have moved from AWS Athena to Ahana Cloud.
Check out the case study from ad tech company Carbon on why they moved from AWS Athena to Ahana Cloud for better query performance and more control over their deployment.
Learn the differences between Presto and Ahana and understand the pros and cons.
Take a deep dive into Presto: what it is, how it started, and the benefits.