Understanding AWS Athena Costs with Examples
What Is Amazon Athena?
Since you’re reading this to understand Athena costs, you likely already know, so we’ll just very briefly touch on what it is. Amazon Athena is a managed serverless version of Presto. It provides a SQL query engine for analyzing unstructured data in AWS S3. The best use case is where reliable speed and scalability are not particularly important, meaning that, since there are no dedicated resources for the service, it will not perform in a consistent fashion. So, testing ideas, small use cases and quick ad-hoc analysis are where it makes the most sense.
How Athena charges work
An Athena query costs from $5 to $7 per terabyte scanned, depending on the region. Most materials you read will only quote the $5, but there are regions that cost $7, so keep that in mind. For our examples, we’ll use the $5 per terabyte as our base. There are no costs for failed queries, but any other charges such as the S3 storage will apply as usual for any service you are using.
In this example, we have a screenshot from the Amazon Athena pricing calculator where we are assuming 1 query per work day per month, so 20 queries a month, that would scan 4TB of data. The cost per query works out as follows:
$5 per TB scanned * 4 TB scanned = $20 per query
So if we are doing that query 20 times per month, then we have 20 * $20 = $400 per month
You can mitigate these costs by storing your data compressed, if that is an option for you. A very conservative 2:1 compression rate would cut your costs in half to just $200 per month. Now, if you were to store your data in a columnar format like ORC or Parquet, then you can reduce your costs even further by only scanning the columns you need, instead of the entire row every time. We’ll use the same 50% notion where we now only have to look at half our data, and now our cost is down to $100 per month.
Let’s go ahead and try a larger example, and not even a crazy big one if you are using the data lake and doing serious processing. Let’s say you have 20 queries per day, and you are working on 100TB of uncompressed, row based data:
That’s right, $304,000 per month. Twenty queries per day isn’t even unrealistic if you have some departments that are wanting to run some dashboard queries to get updates on various metrics.
While we learned details about Athena pricing, we also saw how easy it would be to get hit with a giant bill unexpectedly. If you haven’t compressed your data, or reformatted it to reduce those costs and just dumped a bunch of CSV or JSON files into S3, then you can have a nasty surprise. If you unleash connections to Athena to your data consumers without any controls, you can also end up with some nasty surprises if they are firing off a lot of queries on a lot of data. It’s not hard to figure out what the cost will be for specific usage, and Amazon has provided the tools to do it.
If you’re an Athena user who’s not happy with costs, you’re not alone. We see many Athena users wanting more control over their deployment and in turn, costs. That’s where we can help – Ahana is SaaS for Presto (the same technology that Athena is running) that gives you more control over your deployment. Typically our customers see up to 5.5X price performance improvements on their queries as compared to Athena.
You can learn more about how Ahana compares to AWS Athena in this comparison page.