Today I am incredibly excited to announce Ahana Cloud for Presto, the first fully integrated, cloud-native managed service for Presto – that simplifies the ability of cloud and data platform teams of all sizes to provide self-service, SQL analytics for their data analysts and scientists. Before I share more about what Ahana Cloud for Presto does, I’d like to share why we built it and the problems it solves.
Data warehousing emerges
Data warehousing emerged in the 90s as the internet was surging in popularity and data was burgeoning. The new competitive, constantly-changing global economy required greater business intelligence, and there was a broad realization that data needed to be integrated to provide the critical business insights for decision-making. Teradata, Oracle, Microsoft SQL Server and IBM DB2 warehouses exploded. That requirement for greater business insights has only grown since and Snowflake’s tremendous IPO last week showed how data warehousing has evolved and now moved to the cloud.
The first step of data warehousing involves ingesting all data continuously and constantly into a single database.
Once data is in the data warehouse, you can query it and report on it. Typically, these systems are closed source, with data stored in proprietary formats. Because of the technology and data lock-in, these systems are also very expensive.
An alternative architecture arises – the open federated, disaggregated stack
Over the past 5 years, while the traditional data warehousing approach of a tightly coupled database continued to be adopted, an alternative approach started to be widely adopted by the most innovative technology companies – Facebook, Twitter, Uber, Netflix and others. A loosely coupled disaggregated stack that enabled querying across many databases and data lakes became the dominant standard for their analytics – with the tightly coupled data warehousing approach relegated to legacy workloads.
This new SQL analytics stack is made of 4 elements – the query engine, metadata catalog, transaction manager, and storage engine. And Presto has emerged as the defacto query engine. Presto is a federated, distributed query engine created and open sourced by Facebook. It is designed to be extensible and pluggable, which led to its extensive connector ecosystem.
But why? Why is this disaggregated stack with Presto as the foundation the preferred choice for the most advanced technology companies who can afford to buy products off the shelf?
First, this federated, disaggregated stack addresses the new realities of data
- There is just too much data being generated and a single database is no longer the solution to support a wide range of analytics
- Data will be stored in data lakes, but other pertinent data will still reside in a range of other databases
- SQL analytics is needed for both the data lake where raw data resides in cheap storage as well as the broad range of other databases data continues to live in
Second, this federated, disaggregated stack is open
Open source – PrestoDB under the Linux Foundation is completely open source under the Apache 2.0 license. This means that you benefit from the best innovations, not just from one vendor but from the entire community.
Open formats – PrestoDB doesn’t use any proprietary formats. In fact, it supports most of the common formats like JSON, Apache ORC, Apache Parquet and others.
Open interfaces – PrestoDB is ANSI SQL compatible. Standard JDBC / ODBC drivers can be used to connect to any reporting / dashboarding / notebook tool. And because it is open source, language clauses continue to be added in and expanded on.
Open cloud – PrestoDB is cloud agnostic and because it runs as a query engine without storage natively aligns with containers and can be run on any cloud.
Technology companies prefer this open approach compared to the proprietary formats and technology lock-in that come with the traditional data warehousing approach.
Why isn’t the open federated, disaggregated stack with Presto ubiquitous?
This is the question I asked myself over the past year. As I talked with hundreds of data engineers and platform engineers, it became crystal clear.
The power of Presto is fantastic, but still out of reach of many platform engineering teams who may not have the time or skills required to manage Presto. Born in the Hadoop world, Presto is still complex. It’s a distributed data system with extensive configuration, tuning, integration and management required. Managing it on top of containers and systems like Kubernetes makes it even more challenging. While some companies – particularly large Internet ones – enable self-service SQL analytics across many data sources, including both data lakes and databases, many others have not yet been able to do so given the complexity of these activities. This is what Ahana Cloud solves.
Introducing Ahana Cloud
Ahana Cloud for Presto–the first fully integrated, cloud native managed service–simplifies the ability of cloud and data platform teams to provide self-service, SQL analytics for an organization’s analysts and scientists.
0 to Presto in 30 minutes including the in-VPC AWS service in your own account with the Ahana Console
Ahana comes with a built-in catalog and easy integration with data sources, catalogs and dashboarding tools
Runs on Amazon Elastic Kubernetes Service for high scalability, availability and manageability
Learn more about Ahana Cloud for Presto here.
Just the beginning
This is just the beginning for us at Ahana. The combination of a cloud managed service with federated SQL analytics has opened up an enormous set of possible innovations to simplify analytics for platform teams – with the eventual goal of being self-managed and self-healing. I am excited that Ahana makes the power of Presto widely accessible and achievable to data platform teams of every size and, at the same time, contributes back to the community and open source.
Are you ready to go from 0 to Presto in 30 minutes?