Amazon Athena is a useful query tool – but sometimes you need more control over price, performance, and scale. Ahana runs on the same powerful underlying technology but gives you back control – so you can scale your data lake analytics without exploding your cloud bill. Learn more or request a demo today
Welcome to our blog series on comparing AWS Athena, a serverless Presto service, to open source PrestoDB. In this series we’ll discuss Amazon’s Athena service versus PrestoDB. We’ll also discuss some of the reasons why you’d choose to deploy PrestoDB on yourself, rather than using the AWS Athena service. We hope you find this series helpful.
AWS Athena is an interactive query service built on PrestoDB that developers use to query data stored in Amazon S3 using standard SQL. Athena has a serverless architecture, which is a benefit. However, one of the drawbacks is the cost of AWS Athena. Currently, users pay per query. Currently it’s priced at $5 per terabyte scanned. Some of the common Amazon Athena limits are technical limitations that include query limits, concurrent queries limits, and partition limits. AWS Athena limits performance, as it runs slowly and increases operational costs. In addition to this, AWS Athena is built on an older version of PrestoDB and it only supports a subset of the PrestoDB features.
An overview on AWS Athena limits
AWS Athena query limits can cause problems, and many data engineering teams have spent hours trying to diagnose them. Most of the limitations associated with Athena are rather challenging. Luckily, some are soft quotas. With these, you can request AWS to increase them. One big issue is around Athena’s restrictions on queries: Athena users can only submit one query at a time and can only run up to five queries simultaneously for each account by default.
AWS Athena query limits
AWS Athena Data Definition Language (DDL, like CREATE TABLE statements) and Data Manipulation Language (DML, like DELETE and INSERT) have the following limits:
1. Athena DDL max query limit: 20 DDL active queries .
2. Athena DDL query timeout limit: The Athena DDL query timeout is 600 minutes.
3. Athena DML query limit: Athena only allows you to have 25 DML queries (running and queued queries) in the US East and 20 DML queries in all other Regions by default.
4. Athena DML query timeout limit: The Athena DML query timeout limit is 30 minutes.
5. Athena query string length limit: The Athena query string hard limit is 262,144 bytes.
Ready To Work Without Limitations?
Get Started Today for Free With Ahana Cloud
AWS Athena partition limits
- Athena’s users can use AWS Glue, a data catalog and ETL service. Athena’s partition limit is 20,000 per table and Glue’s limit is 1,000,000 partitions per table.
- A Create Table As (CTAS) or INSERT INTO query can only create up to 100 partitions in a destination table. To work around this limitation you must manually chop up your data by running a series of INSERT INTOs that insert up to 100 partitions each.
Athena database limits
AWS Athena also has the following S3 bucket limitations:
1. Amazon S3 bucket limit is 100* buckets per account by default – you can request to increase it up to 1,000 S3 buckets per account.
2. Athena restricts each account to 100* databases, and databases cannot include over 100* tables.
*Note, recently Athena has increased this to 10K databases per account and 200K tables per database.
Summary: Athena DB limits:
|Amazon S3 bucket limit||1k buckets per account|
|Database limit||10K databases per account|
|Tables per database||200k|
AWS Athena open-source alternative
Deploying your own PrestoDB cluster
An Amazon Athena alternative is deploying your own PrestoDB cluster. Amazon Athena is built on an old version of PrestoDB – in fact, it’s about 60 releases behind the PrestoDB project. Newer features are likely to be missing from Athena (and in fact it only supports a subset of PrestoDB features to begin with).
Deploying and managing PrestoDB on your own means you won’t have AWS Athena limitations such as the athena concurrent queries limit, concurrent queries limits, database limits, table limits, partitions limits, etc. Plus you’ll get the very latest version of Presto. PrestoDB is an open source project hosted by The Linux Foundation’s Presto Foundation. It has a transparent, open, and neutral community.
If deploying and managing PrestoDB on your own is not an option (time, resources, expertise, etc.), Ahana can help.
Ahana Cloud for Presto: A fully managed service
Ahana Cloud for Presto is a fully managed Presto cloud service, without the limitations of AWS Athena.
You use AWS to query and analyze AWS data lakes stored in Amazon S3, and many other data sources, using the latest version of PrestoDB. Ahana is cloud-native and runs on Amazon Elastic Kubernetes (EKS), helping you to reduce operational costs with its automated cluster management, speed and ease of use. Ahana is a SaaS offering via a beautiful and easy to use console UI. Anyone at any knowledge level can use it with ease, there is zero configuration effort and no configuration files to manage. Many companies have moved from AWS Athena to Ahana Cloud.
Learn how you can get better price/performance when querying S3: schedule a free consultation call with an Ahana solution architect.
Up next: AWS Athena Query Limits
Learn the differences between Presto and Ahana and understand the pros and cons.
Take a deep dive into Presto: what it is, how it started, and the benefits.
Discover the 4 most popular choices to replace Amazon Athena.