The Open Data Lakehouse

Data warehouse workloads at better price performance and more flexibility

Combining the best of each: The Open Data Lakehouse

The Open Data Lakehouse brings the reliability and performance of the data warehouse together with the flexibility and better price performance of the data lake, enabling SQL and ML/AI use cases on your data.

Presto is the SQL query engine for the Open Data Lakehouse, enabling warehouse workloads on your data lake for better price performance.

The next enterprise data warehouse is the Open Data Lakehouse

India’s leading instant delivery service moves to Open Data Lakehouse

See how BlinkIt powers 200K orders/day and deliveries in under 10min

Benefits of a Data Lakehouse

Better Price Performance

More control of your compute costs with Presto

As businesses need more analytics on more data, compute costs can skyrocket in your data warehouse. With Presto for the Open Data Lakehouse, you get more control over your compute costs for better price performance.

Flexible and Efficient

Support your BI/Dashboarding and Data Science/AI/ML workloads

Do more with your unstructured and semi-structured data. The Open Data Lakehouse opens up more use cases by enabling you to query all types of data and run your AI/ML frameworks on big data in a flexible and efficient way.

Open

No vendor lock-in, no proprietary data formats

Store data in Open Formats (Parquet, Apache ORC, and more) so you can use any compute engine. Leverage open source technologies (Presto, TensorFlow, and more) to avoid lock-in.

Open Data Lakehouse Components

Query Engine

Presto is the open source SQL query engine for the Data Lakehouse. It enables ad hoc analytics on your data to power your dashboarding and reporting needs. Query data where it lives and no need to migrate to proprietary data formats.

See how Uber uses Presto at scale for their data lakehouse >

Transaction Management

It can be challenging to keep data updated in a data warehouse and typically requires constant ETL from sources to destination, resulting in additional time, cost, and duplication of data. Transaction management with technologies like Apache Hudi, Iceberg, or Delta Lake enables ingesting incremental data, managing data capture for inserts and deletions, and ACID transactions.

See how to build your data lakehouse with Apache Hudi and Presto >

Security & Governance

Bring the security and governance of data warehouses to the data lakehouse with technologies like AWS Lake Formation or Apache Ranger – you select which works best for your needs. Define access control policies down to the row level, enabling you to handle sensitive data.

See how to enable AWS Lake Formation with Presto >

Catalog/Metadata

The catalog describes all of the data that’s stored in your system to make it usable, so you can analyze it to create dashboards and reports. Having a catalog like AWS Glue or Hive Metastore or an open source option like Amundsen in your Open Data Lakehouse is critical.

See how to configure Hive Metastore in your Open Data Lakehouse >

Making it easy: SaaS for Presto

Get a powerful SQL query engine as SaaS for your data lakehouse. Ahana is a managed service for Presto that’s simple to use and cost-effective.

Getting Started with your Open Data Lakehouse in AWS

Ready to start building your Open Data Lakehouse in AWS? Here are some resources to get you started.

Schedule a Demo

We’ll show you how to migrate data warehouse workloads or build an open data lakehouse from scratch.

Schedule Now

Customer use case

See why Blinkit moved from the data warehouse to the Open Data Lakehouse

Read case study

Webinar

See how to build an Open Data Lakehouse stack in our on-demand webinar.

Watch on-demand

Free Presto trial – Get Started with the Open Data Lakehouse in AWS
No credit card required