AWS Athena is well documented in having performance issues, both in terms of unpredictability and speed. Many users have pointed out that even relatively lightweight queries on Athena will fail. One part of the issue may be due to how many columns the user has in the Group By clause – even a small amount of columns (like less than 5 columns) will run into this issue of not having enough resources to complete. Other times it may be due to how much data is being parsed, and again even small amounts of data (like less than 200MB) will run into this issue of not having enough resources to complete.
Presto stores Group By columns in memory while it works to match rows with the same group by key. The more columns that are in the Group By clause, the fewer number of rows that will get collapsed with the aggregation. To address this problem, users will have to reduce the number of columns in the Group By clause and retry the query.
And still at other times, the issue may not be how long the query takes but if the query runs at all. Users that experience “internal errors” on queries one hour will re-run the same queries that triggered those errors and they will succeed.
Ultimately, AWS Athena is not predictable when it comes to query performance. For those who want to take advantage of Presto and get consistent and predictable query performance you can control, Ahana Cloud provides a managed service for Presto that runs in AWS.