Ahana Cloud for Presto – Simplifying Analytics for the Disaggregated Stack

Dipti Borkar, Cofounder & Chief Product Officer

Today I am incredibly excited to announce Ahana Cloud for Presto, the first fully integrated, cloud-native managed service for Presto – that simplifies the ability of cloud and data platform teams of all sizes to provide self-service, SQL analytics for their data analysts and scientists. Before I share more about what Ahana Cloud for Presto does, I’d like to share why we built it and the problems it solves. 

Data warehousing emerges

Data warehousing emerged in the 90s as the internet was surging in popularity and data was burgeoning. The new competitive, constantly-changing global economy required greater business intelligence, and there was a broad realization that data needed to be integrated to provide the critical business insights for decision-making. Teradata, Oracle, Microsoft SQL Server and IBM DB2 warehouses exploded. That requirement for greater business insights has only grown since and Snowflake’s tremendous IPO last week showed how data warehousing has evolved and now moved to the cloud. 

ingest everything into a data warehose

The first step of data warehousing involves ingesting all data continuously and constantly into a single database. 

Once data is in the data warehouse, you can query it and report on it. Typically, these systems are closed source, with data stored in proprietary formats. Because of the technology and data lock-in, these systems are also very expensive. 

An alternative architecture arises – the open federated, disaggregated stack 

Over the past 5 years, while the traditional data warehousing approach of a tightly coupled database continued to be adopted, an alternative approach started to be widely adopted by the most innovative technology companies – Facebook, Twitter, Uber, Netflix and others. A loosely coupled disaggregated stack that enabled querying across many databases and data lakes became the dominant standard for their analytics – with the tightly coupled data warehousing approach relegated to legacy workloads. 

This new SQL analytics stack is made of 4 elements – the query engine, metadata catalog, transaction manager, and storage engine. And Presto has emerged as the defacto query engine. Presto is a federated, distributed query engine created and open sourced by Facebook. It is designed to be extensible and pluggable, which led to its extensive connector ecosystem. 

This image has an empty alt attribute; its file name is ilKr6_G55mqr4K9tsab5VbdNgvgRCBQlJai6csiG1FuiumZRZCDmEeaaGlZ3MWONIvEXfaWtCLUYNUxA50NJKjKuPVSjUs9TpulVzkJs7vBURUw6F5PEMubnaqC2FaHbtAnenbSR

But why? Why is this disaggregated stack with Presto as the foundation the preferred choice for the most advanced technology companies who can afford to buy products off the shelf?  

First, this federated, disaggregated stack addresses the new realities of data

  • There is just too much data being generated and a single database is no longer the solution to support a wide range of analytics 
  • Data will be stored in data lakes, but other pertinent data will still reside in a range of other databases
  • SQL analytics is needed for both the data lake where raw data resides in cheap storage as well as the broad range of other databases data continues to live in

Second, this federated, disaggregated stack is open 

Open source – PrestoDB under the Linux Foundation is completely open source under the Apache 2.0 license. This means that you benefit from the best innovations, not just from one vendor but from the entire community. 

Open formats – PrestoDB doesn’t use any proprietary formats. In fact, it supports most of the common formats like JSON, Apache ORC, Apache Parquet and others.

Open interfaces – PrestoDB is ANSI SQL compatible. Standard JDBC / ODBC drivers can be used to connect to any reporting / dashboarding / notebook tool. And because it is open source, language clauses continue to be added in and expanded on. 

Open cloud PrestoDB is cloud agnostic and because it runs as a query engine without storage natively aligns with containers and can be run on any cloud. 

Technology companies prefer this open approach compared to the proprietary formats and technology lock-in that come with the traditional data warehousing approach.

Why isn’t the open federated, disaggregated stack with Presto ubiquitous? 

This is the question I asked myself over the past year. As I talked with hundreds of data engineers and platform engineers, it became crystal clear. 

The power of Presto is fantastic, but still out of reach of many platform engineering teams who may not have the time or skills required to manage Presto. Born in the Hadoop world, Presto is still complex.  It’s a distributed data system with extensive configuration, tuning, integration and management required. Managing it on top of containers and systems like Kubernetes makes it even more challenging. While some companies – particularly large Internet ones – enable self-service SQL analytics across many data sources, including both data lakes and databases, many others have not yet been able to do so given the complexity of these activities.  This is what Ahana Cloud solves.

Introducing Ahana Cloud 

Ahana Cloud for Presto–the first fully integrated, cloud native managed service–simplifies the ability of cloud and data platform teams to provide self-service, SQL analytics for an organization’s analysts and scientists.

Easy

0 to Presto in 30 minutes including the in-VPC AWS service in your own account with the Ahana Console

Fully integrated 

Ahana comes with a built-in catalog and easy integration with data sources, catalogs and dashboarding tools

Cloud Native

Runs on Amazon Elastic Kubernetes Service for high scalability, availability and manageability

Learn more about Ahana Cloud for Presto here

Just the beginning 

This is just the beginning for us at Ahana. The combination of a cloud managed service with federated SQL analytics has opened up an enormous set of possible innovations to simplify analytics for platform teams – with the eventual goal of being self-managed and self-healing.  I am excited that Ahana makes the power of Presto widely accessible and achievable to data platform teams of every size and, at the same time, contributes back to the community and open source. 

Are you ready to go from 0 to Presto in 30 minutes? 

Sign up now to get Free Early Access.

Introducing Ahana

Dipti Borkar, Co-Founder & CPO and Steven Mih, Co-Founder & CEO

There are many factors that contribute to strong co-founder relationships. Forbes refers to one set and defines them as the 3Ts: The first one being ‘trust’, which is self explanatory, second being ‘talk’, that is the ability to talk things out even if you disagree, and the third being ‘target’, that is having one mission with aligned goals. While it’s easy to put this into a framework, getting this right is something rare. For us, it took nearly 10 years, joint experiences at two companies and working through numerous tough situations and tough conversations. The outcome is Ahana.

While we’re heads down working on Ahana, we’d like to share our vision and the problem we intend to solve. Our vision is to simplify the interactive, ad hoc analytics journey for users. But hasn’t this problem been solved already? Let’s walk through the evolution of data systems for a minute or two. 

An architectural shift 

It has been nearly 15 years since the Google MapReduce paper was published (2004) and Amazon Web Services was launched (2006). These were key milestones that marked the beginnings of two of the biggest trends in enterprise software that continue on even today. In these 15 years a lot has changed. Data is stored in a lot more places than the three early relational databases (IBM DB2, Oracle, Microsoft SQL Server). In fact with polyglot persistence, organizations have different data systems for each use case. What this also means is that the metadata that used to be unified as the star schema of the sacred data warehouse, is now also spread across a variety of data sources.  

In addition, the five key components of a database as defined in database textbooks are now independently running pieces of software in the big data stack. The database stack is completely disaggregated – the query processor (examples: Apache Hive, Apache Spark, Presto), the metadata catalog (examples: Hive Metastore, AWS Glue), the storage engine (examples: object stores, AWS S3, Google GCS, Azure ADLS, other RDBMSes) and even the transaction manager (examples: Apache HUDI, DeltaLake). This separation brings great flexibility at the cost of tremendous complexity. 

Source: Textbook: Architecture of a Database System, Hellerstein, Stonebraker, Hamilton

The largest internet companies have some variation of these components in production at massive scale.  Big platform teams with the brightest engineers work on integrating these components together, innovate on them, deploy them and maintain them. And while it is complex and resource intensive, they get tremendous value by enabling data-driven decisioning with ad hoc analytics platforms. 

But every company and organization should have the ability to make interactive, ad hoc data-driven decisions without the need to integrate, manage and deploy a complex stack.

Data Federation makes a come back 

The heart of the data stack is the query processor, the query engine. And the good news is that there has been immense innovation in the open source community on modern standalone query engines. In fact, the idea of a separated query engine isn’t new. Data Federation has had many iterations over time.  

Data Federation 1.0 started with the ACM Federated Architecture paper by McCleod and Heimbigner (1985) History rhymed again with Data Federation 2.0 in early 2000s with the founding of Composite Software and the Garlic Paper on DB2 Federation (2002) being published. But there were still too few data sources to query against and the stack wasn’t disaggregated which reduced the need for federation. 

Data Federation 3.0 made a comeback with Google’s Dremel paper in 2010. Couchbase, where we both worked for many years, implemented SQL++, another federated query engine designed at the UC San Diego database lab. Presto was designed and built at Facebook and open sourced in 2013. Since then, its adoption has simply exploded. 

Facebook and the creators of Presto – Martin, David and Dain, who were engineers there, have built a nicely designed distributed system (Presto IEEE paper) –  highly modular, pluggable and extensible. While there is room for improvement (like moving away from a single coordinator node, better resource management, a more advanced planner that reads less data and pushes down more work), Presto has become the de facto choice for interactive, ad hoc querying on a disaggregated stack

Community-driven Presto and Presto Foundation

We have both learned from our enterprise software experience that developers and engineers primarily care about solving problems in order to get things done. As the use of an open source software grows in importance to an organization, the developers also care about a project’s transparency, openness and neutrality. Open source software has moved from its fringe beginnings to the forefront of technology innovation. While both of us have participated in the evolution of open source commercialization from support-only models to open-core based proprietary subscription models even extending to infrastructure software as a service, the common underlying factor to all these models is the requirement of a strong, vibrant open source community. Achieving that requirement requires more than simply an open source license like Apache 2.0. It takes a thoughtful, transparent, and authentic approach in all interactions. We believe that is open source done right. 

At Alluxio, where we both worked recently, we became very involved with the open source Presto community. Dipti presented many joint talks with founders of Starburst Data as well as pushed out product offerings integrated with Starburst Presto. Steven engaged with many end user companies to evaluate and deploy Presto with Alluxio into production environments.

We soon realized that there was a lot more to the Presto community than we knew. There were in fact two separate Github repos, two slack channels and two websites. All things very, very confusing for any open source community. Then in September 2019, Facebook donated the original project, PrestoDB to the Linux Foundation, to further grow and evangelize the community under an established open source governance model similar to CNCF and Kubernetes. We joined the Presto Foundation (Steven, a member of the Governing Board and Dipti, as chairperson of the Outreach Committee) to evangelize and support Presto. 

A new adventure begins

We had talked about founding a company a few times in the past, and seeing the ever-growing problems with disparate data systems, combined with the federated query engine returning to the forefront, we have embarked on a new venture. We believe that data federation 3.0 with Presto will become the architectural foundation to meet the needs of modern data teams. 

So here we are. We are excited to share that Ahana has raised $2.25 million in funding led by GV (formerly Google Ventures) along with participation from Leslie Ventures and other angel investors. We’re thrilled to have Dave Munichiello from GV to be our lead investor. He and the whole GV team have continued to be fantastic in their support of our vision. We are excited to build out our technical team and deliver simplified Presto-based analytics products for every organization. Stay tuned in by joining our All Things PrestoDB newsletter

We’ve worked together at two companies, we have a great working relationship, and we’re passionate about bringing open source products to market. This time, we’re thrilled to start from the ground up, as friends and as co-founders of Ahana. Cheers!

San Francisco, April 2020 – Dipti Borkar and Steven Mih sosh-dist-celebrating the closing of the Ahana seed round in Steven’s garage office with the leaf blower