Understanding AWS Athena Costs with Examples
Athena costs can be unpredictable and hard to control. Ahana is the PrestoDB solution that gives you back control over both performance and your cloud bill with pay-as-you-go pricing and the ability to provision more resources to tackle harder queries. Learn more or request a demo today
What Is Amazon Athena?
Since you’re reading this to understand Athena costs, you likely already know, so we’ll just very briefly touch on what it is. Amazon Athena is a managed serverless version of Presto. It provides a SQL query engine for analyzing unstructured data in AWS S3. The best use case is where reliable speed and scalability are not particularly important, meaning that, since there are no dedicated resources for the service, it will not perform in a consistent fashion. So, testing ideas, small use cases and quick ad-hoc analysis are where it makes the most sense.
How Much Does AWS Athena Cost?
An Athena query costs from $5 to $7 per terabyte scanned, depending on the region. Most materials you read will only quote the $5, but there are regions that cost $7, so keep that in mind. For our examples, we’ll use the $5 per terabyte as our base. There are no costs for failed queries, but any other charges such as the S3 storage will apply as usual for any service you are using.
AWS Athena Pricing Example
In this example, we have a screenshot from the Amazon Athena pricing calculator where we are assuming 1 query per work day per month, so 20 queries a month, that would scan 4TB of data. The cost per query works out as follows. At $5 per TB scanned, we would pay $20 for a query that scans 4 TB of data. If we are running that query 20 times per month, we get to 20 * 20 = $400 per month.
|Price per TB scanned:||$5|
|Queries per month:||20|
|TB of data scanned, per query:||4|
|Total monthly cost:||$400|
You can mitigate these costs by storing your data compressed, if that is an option for you. A very conservative 2:1 compression rate would cut your costs in half to just $200 per month. Now, if you were to store your data in a columnar format like ORC or Parquet, then you can reduce your costs even further by only scanning the columns you need, instead of the entire row every time. We’ll use the same 50% notion where we now only have to look at half our data, and now our cost is down to $100 per month.
Let’s go ahead and try a larger example, and not even a crazy big one if you are using the data lake and doing serious processing. Let’s say you have 20 queries per day, and you are working on 100TB of uncompressed, row based data:
That’s right, $304,000 per month. Twenty queries per day isn’t even unrealistic if you have some departments that are wanting to run some dashboard queries to get updates on various metrics.
While we learned details about Athena pricing, we also saw how easy it would be to get hit with a giant bill unexpectedly. If you haven’t compressed your data, or reformatted it to reduce those costs and just dumped a bunch of CSV or JSON files into S3, then you can have a nasty surprise. If you unleash connections to Athena to your data consumers without any controls, you can also end up with some nasty surprises if they are firing off a lot of queries on a lot of data. It’s not hard to figure out what the cost will be for specific usage, and Amazon has provided the tools to do it.
If you’re an Athena user who’s not happy with costs, you’re not alone. We see many Athena users wanting more control over their deployment and in turn, costs. That’s where we can help – Ahana is SaaS for Presto (the same technology that Athena is running) that gives you more control over your deployment. Typically our customers see up to 5.5X price performance improvements on their queries as compared to Athena.
Learn how you can get better price/performance when querying S3: schedule a free consultation call with an Ahana solution architect.
A Comprehensive Guide to Data Warehouse Types
A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing.
Presto vs Snowflake: Data Warehousing Comparisons
Presto is an open-source SQL query engine for data lakehouse analytics. Snowflake is a cloud data warehouse that offers a cloud-based information storage and analytics service. Learn more about the differences in this article