Welcome to the 2nd blog in our blog series on comparing AWS Athena, a serverless Presto service, to open source PrestoDB. In this series we’ll discuss Amazon’s Athena service versus PrestoDB and some of the reasons why you might choose to deploy PrestoDB on your own instead of using the AWS Athena service.
Taking a look at AWS Athena Query Limits
Amazon Athena allows users to perform ad-hoc analytics on data lakes without the need for time-consuming Extract Transform Load (ETL) cycles. In order to maintain the availability of the service when processing such vast and diverse data, certain design decisions and limitations were introduced. Some of these are:
- Query string limit: 262,144 bytes.
- Some statements like EXPLAIN, MERGE, and UPDATE are not supported.
- Only one query per user is permitted, and 5 to 200 queries per account.
- Queries can timeout on tables with thousands of partitions.
- Cross-region querying is limited to sixteen regions.
- AWS Athena partition limits happen because a Create Table As (CTAS) or INSERT INTO query can create up to a maximum of 100 partitions in a destination table. To work around this limitation you must use a series of INSERT INTO or CTAS statements that insert up to 100 partitions each.
- Query timeouts can occur on tables with many thousands of partitions. This can happen when the table has many partitions that are not of type string.
- Athena concurrent queries limit occurs due to limits imposed on accounts to avoid users from overwhelming the regional service.
- Queries can be queued or canceled by the Athena service when exceeding a data threshold.
- An adjustable quota of 20 DDL (Data Definition Language, like CREATE TABLE) active queries (600-minute timeout) and 20-25 DML (Data Manipulation Language – like SELECT and CREATE TABLE AS) active queries (30-minute timeout) is permitted.
- Maximum 20,000 partitions per table.
- Maximum number of databases is 100. Each database can contain a maximum of 100 tables.
- Full-table scans are necessary due to the lack of indices.
- AWS Athena’s federated queries cannot be used with views, and the required Athena Data Source Connectors (that run on AWS Lambda) are complicated to setup.
- Nodes can run out of memory when sorting columns.
- There is no default support for stored procedures (but can be added with an adapter).
- Partition creation is restricted to 100 at once (workarounds exist).
- The same server pools run queries for multiple accounts, so queries tend to get enqueued based on the service load.
- ..and much more. You can read more details about AWS Athena’s limitations in my other blog post on the topic.
Configuring and fine-tuning Presto or working around the limitations of Amazon Athena takes a lot of experience. A solution like Ahana Cloud can help.
Ahana Cloud for Presto is a fully managed PrestoDB cloud service for AWS to query and analyze AWS data lakes stored in Amazon S3, and many other data sources. Unlike Athena, it uses the latest version of PrestoDB. Ahana is cloud-native and runs on Amazon Elastic Kubernetes (EKS), helping you to reduce operational costs with its automated cluster management, speed and ease of use. Ahana is a SaaS offering with an easy to use console UI. Anyone at any knowledge level can use it with ease, there is zero configuration effort and no configuration files to manage. Many companies have moved from AWS Athena to Ahana Cloud. You can try Ahana Cloud today as a free trial.
Check out the case study from ad tech company Carbon on why they moved from AWS Athena to Ahana Cloud for better query performance and more control over their deployment.