This is the 4th blog in our comparing AWS Athena to PrestoDB series. If you missed the others, you can find them here:
What is Athena?
Amazon Athena is an interactive query service based on Presto that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage. Athena is great for interactive querying on datasets already residing in S3 without the need to move the data into another analytics database or a cloud data warehouse. Athena (engine 2) also provides federated query capabilities, which allows you to run SQL queries across data stored in relational, non-relational, object, and custom data sources.
Why would I not want to use Athena?
There are various reasons users look for alternative options to Athena, in spite of its advantages:
- Performance consistency: Athena is a shared, serverless, multi-tenant service deployed per-region. If too many users leverage the service at the same time in a region, users across the board start seeing query queuing and latencies. Query concurrency can be challenging due to limits imposed on accounts to avoid users from overwhelming the regional service.
- Cost per query: Athena charges based on Terabytes of data scanned ($5 per TB). If your datasets are not very large, and you don’t have a lot of users querying the data often, Athena is the perfect solution for your needs. If however, your datasets are large in the order of hundreds or thousands of queries, scanning over terabytes or petabytes of data Athena may not be the most cost-effective choice.
- Visibility and Control: There are no knobs to tweak in terms of capacity, performance, CPU, or priority for the queries. You have no visibility into the underlying infrastructure or even into the details as to why the query failed or how it’s performing. This visibility is important from a query tuning and consistency standpoint and even to reduce the amount of data scanned in a query.
- Security: In spite of having access controls via IAM and other AWS security measures, some customers simply want better control over the querying infrastructure and choose to deploy a solution that provides better manageability, visibility, and control.
- Feature delays: Presto is evolving at an expedited rate, with new performance features, SQL functions, and optimizations being contributed by the community as well as companies such as Facebook, Alibaba, Uber, and others periodically. Amazon caught up with version 0.217 only in Nov 2020. With the current version of Presto DB being 0.248, if you need the performance, features, and efficiencies that newer versions provide you are going to have to wait for some time.
What are the typical alternatives to Athena?
Depending upon a user’s business need and the level of control desired users, leverage one or more of the following options:
DIY open-source PrestoDB
Instead of using Athena, users deploy open-source PrestoDB in their environment (either On-Premises or in the Cloud). This mode of deployment gives the user the most amount of flexibility in terms of performance, price, and security; however, it comes at a cost. Managing a PrestoDB deployment requires expertise and resources (personnel and infrastructure) to tweak, manage and monitor the deployment.
Large scale DIY PrestoDB deployments do exist at enterprises that have mastered the skills of managing large-scale distributed systems such as Hadoop. These are typically enterprises maintaining their own Hadoop clusters or companies like FAANG (Facebook, Amazon, Apple, Netflix, Google) and tech-savvy startups such as Uber, Pinterest, just to name a few.
The cost of managing an additional PrestoDB cluster may be incremental for a customer already managing large distributed systems, however, for customers starting from scratch, this can be an exponential increase in cost.
Managed Hadoop and Presto
Cloud providers such as AWS, Google, and Azure provide their own version of Managed Hadoop.
AWS provides EMR (Elastic Map Reduce), Google provides Data Proc and Azure provides HDInsight. These cloud providers support compatible versions of Presto that can be deployed on their version of Hadoop.
This option provides a “middle ground” where you are not responsible for managing and operating the infrastructure as you would traditionally do in a DIY model, but instead are only responsible for the configuration and tweaks required. Cloud provider-managed Hadoop deployments take over most responsibilities of cluster management, node recovery, and monitoring. Scale-out becomes easier at the push of a button, as costs can be further optimized by autoscaling using either on-demand or spot instances.
You still need to have the expertise to get the most of your deployment by tweaking configurations, instance sizes, and properties.
Managed Presto Service
If you would rather not deal with what AWS calls the “undifferentiated heavy lifting”, a Managed Presto Cloud Service is the right solution for you.
Ahana Cloud provides a fully managed Presto cloud service, with a wide range of native Presto connectors support, IO caching, optimized configurations for your workload. An expert service team can also work with you to help tune your queries and get the most out of your Presto deployment. Ahana’s service is cloud-native and runs on Amazon’s Elastic Kubernetes Service (EKS) to provide resiliency, performance, scalability and also helps reduce your operational costs.
A managed Presto Service such as Ahana gives you the visibility you need in terms of query performance, instance utilization, security, auditing, query plans as well as gives you the ability to manage your infrastructure with the click of a button to meet your business needs. A cluster is preconfigured with optimum defaults and you can tweak only what is necessary for your workload. You can choose to run a single cluster or multiple clusters. You can also scale up and down depending upon your workload needs.
Ahana is a premier member of the Linux Foundation’s Presto Foundation and contributes many features back to the open-source Presto community, unlike Athena, Presto EMR, Data Proc, and HDInsight.
You have a wide variety of options regarding your use of PrestoDB.
If maximum control is what you need and you can justify the costs of managing a large team and deployment, then DIY implementation is right for you.
On the other hand, if you don’t have the resources to spin up a large team but still want the ability to tweak most tuning knobs, then a managed Hadoop with Presto service may be the way to go.
If simplicity and accelerated go-to-market are what you seek without needing to manage a complex infrastructure, then Ahana’s Presto managed service is the way to go. Sign up for our free trial today.