Introducing Ahana

Data Lakehouse

Dipti Borkar, Co-Founder & CPO and Steven Mih, Co-Founder & CEO

Meet Ahana

There are many factors that contribute to strong co-founder relationships. Forbes refers to one set and defines them as the 3Ts: The first one being ‘trust’, which is self explanatory, second being ‘talk’, that is the ability to talk things out even if you disagree, and the third being ‘target’, that is having one mission with aligned goals. While it’s easy to put this into a framework, getting this right is something rare. For us, it took nearly 10 years, joint experiences at two companies and working through numerous tough situations and tough conversations. The outcome is Ahana.

EE4AzC4TPGsZYjq8o46We1bZwrhfIoq

While we’re heads down working on Ahana, we’d like to share our vision and the problem we intend to solve. Our vision is to simplify the interactive, ad hoc analytics journey for users. But hasn’t this problem been solved already? Let’s walk through the evolution of data systems for a minute or two. 

An architectural shift 

It has been nearly 15 years since the Google MapReduce paper was published (2004) and Amazon Web Services was launched (2006). These were key milestones that marked the beginnings of two of the biggest trends in enterprise software that continue on even today. In these 15 years a lot has changed. Data is stored in a lot more places than the three early relational databases (IBM DB2, Oracle, Microsoft SQL Server). In fact with polyglot persistence, organizations have different data systems for each use case. What this also means is that the metadata that used to be unified as the star schema of the sacred data warehouse, is now also spread across a variety of data sources.  

In addition, the five key components of a database as defined in database textbooks are now independently running pieces of software in the big data stack. The database stack is completely disaggregated – the query processor (examples: Apache Hive, Apache Spark, Presto), the metadata catalog (examples: Hive Metastore, AWS Glue), the storage engine (examples: object stores, AWS S3, Google GCS, Azure ADLS, other RDBMSes) and even the transaction manager (examples: Apache HUDI, DeltaLake). This separation brings great flexibility at the cost of tremendous complexity. 

qJo9A1YEh8FcMt8xNoWoqmkxoRkxNNbbgYQOIH4tIuksrJ2zmwK76DQHsj34GOvhDu9VxH4mgxEfbMYgW3CCO TlmwbLJaW3cACXcRBbOqxZISDMrdvV8sJSdUp
Source: Textbook: Architecture of a Database System, Hellerstein, Stonebraker, Hamilton

The largest internet companies have some variation of these components in production at massive scale.  Big platform teams with the brightest engineers work on integrating these components together, innovate on them, deploy them and maintain them. And while it is complex and resource intensive, they get tremendous value by enabling data-driven decisioning with ad hoc analytics platforms. 

But every company and organization should have the ability to make interactive, ad hoc data-driven decisions without the need to integrate, manage and deploy a complex stack.

Data Federation makes a come back 

The heart of the data stack is the query processor, the query engine. And the good news is that there has been immense innovation in the open source community on modern standalone query engines. In fact, the idea of a separated query engine isn’t new. Data Federation has had many iterations over time.  

Data Federation 1.0 started with the ACM Federated Architecture paper by McCleod and Heimbigner (1985) History rhymed again with Data Federation 2.0 in early 2000s with the founding of Composite Software and the Garlic Paper on DB2 Federation (2002) being published. But there were still too few data sources to query against and the stack wasn’t disaggregated which reduced the need for federation. 

ONN GMBv3DHMHSAOUvKhUMuCCDMsZZyqiHOpKh6PhKfBotELLbcvoYoXYpJWs7 h9n4WOoHeAFJpVHGin6BeYWwljzo1jxbV7xlObLyAUuvaJwYehI1iq2 P6AoJcJi n Gwbg3K

Data Federation 3.0 made a comeback with Google’s Dremel paper in 2010. Couchbase, where we both worked for many years, implemented SQL++, another federated query engine designed at the UC San Diego database lab. Presto was designed and built at Facebook and open sourced in 2013. Since then, its adoption has simply exploded. 

Facebook and the creators of Presto – Martin, David and Dain, who were engineers there, have built a nicely designed distributed system (Presto IEEE paper) –  highly modular, pluggable and extensible. While there is room for improvement (like moving away from a single coordinator node, better resource management, a more advanced planner that reads less data and pushes down more work), Presto has become the de facto choice for interactive, ad hoc querying on a disaggregated stack

Community-driven Presto and Presto Foundation

We have both learned from our enterprise software experience that developers and engineers primarily care about solving problems in order to get things done. As the use of an open source software grows in importance to an organization, the developers also care about a project’s transparency, openness and neutrality. Open source software has moved from its fringe beginnings to the forefront of technology innovation. While both of us have participated in the evolution of open source commercialization from support-only models to open-core based proprietary subscription models even extending to infrastructure software as a service, the common underlying factor to all these models is the requirement of a strong, vibrant open source community. Achieving that requirement requires more than simply an open source license like Apache 2.0. It takes a thoughtful, transparent, and authentic approach in all interactions. We believe that is open source done right. 

At Alluxio, where we both worked recently, we became very involved with the open source Presto community. Dipti presented many joint talks with founders of Starburst Data as well as pushed out product offerings integrated with Starburst Presto. Steven engaged with many end user companies to evaluate and deploy Presto with Alluxio into production environments.

We soon realized that there was a lot more to the Presto community than we knew. There were in fact two separate Github repos, two slack channels and two websites. All things very, very confusing for any open source community. Then in September 2019, Facebook donated the original project, PrestoDB to the Linux Foundation, to further grow and evangelize the community under an established open source governance model similar to CNCF and Kubernetes. We joined the Presto Foundation (Steven, a member of the Governing Board and Dipti, as chairperson of the Outreach Committee) to evangelize and support Presto. 

A new adventure begins

We had talked about founding a company a few times in the past, and seeing the ever-growing problems with disparate data systems, combined with the federated query engine returning to the forefront, we have embarked on a new venture. We believe that data federation 3.0 with Presto will become the architectural foundation to meet the needs of modern data teams. 

So here we are. We are excited to share that Ahana has raised $2.25 million in funding led by GV (formerly Google Ventures) along with participation from Leslie Ventures and other angel investors. We’re thrilled to have Dave Munichiello from GV to be our lead investor. He and the whole GV team have continued to be fantastic in their support of our vision. We are excited to build out our technical team and deliver simplified Presto-based analytics products for every organization. Stay tuned in by joining our All Things PrestoDB newsletter

We’ve worked together at two companies, we have a great working relationship, and we’re passionate about bringing open source products to market. This time, we’re thrilled to start from the ground up, as friends and as co-founders of Ahana. Cheers!

Ahana founders
San Francisco, April 2020 – Dipti Borkar and Steven Mih sosh-dist-celebrating the closing of the Ahana seed round in Steven’s garage office with the leaf blower