The Open Data Lakehouse
Data warehouse workloads at better price performance and more flexibility
Combining the best of each: The Open Data Lakehouse
The Open Data Lakehouse brings the reliability and performance of the data warehouse together with the flexibility and better price performance of the data lake, enabling SQL and ML/AI use cases on your data.
Presto is the SQL query engine for the Open Data Lakehouse, enabling warehouse workloads on your data lake for better price performance.
India’s leading instant delivery service moves to The Data Lakehouse
See how BlinkIt powers 200K orders/day and deliveries in under 10min
Benefits of a Data Lakehouse
Better Price Performance
More control of your compute costs with Presto
As businesses need more analytics on more data, compute costs can skyrocket in your data warehouse. With Presto for the Data Lakehouse, you get more control over your compute costs for better price performance.
Flexible and Efficient
Support your BI/Dashboarding and Data Science/AI/ML workloads
Do more with your unstructured and semi-structured data. The Data Lakehouse opens up more use cases by enabling you to query all types of data and run your AI/ML frameworks on big data in a flexible and efficient way.
No vendor lock-in, no proprietary data formats
Store data in Open Formats (Parquet, Apache ORC, and more) so you can use any compute engine. Leverage open source technologies (Presto, TensorFlow, and more) to avoid lock-in.
The SQL Data Lakehouse & Foundations for the New Data Stack
Learn what the data lakehouse is, & how it solves challenges of previous solutions
Open Data Lakehouse Components
Presto is the open source SQL query engine for the Data Lakehouse. It enables ad hoc analytics on your data to power your dashboarding and reporting needs. Query data where it lives and no need to migrate to proprietary data formats.
See how Uber uses Presto at scale for their data lakehouse >
It can be challenging to keep data updated in a data warehouse and typically requires constant ETL from sources to destination, resulting in additional time, cost, and duplication of data. Transaction management with technologies like Apache Hudi, Iceberg, or Delta Lake enables ingesting incremental data, managing data capture for inserts and deletions, and ACID transactions.
See how to build your data lakehouse with Apache Hudi and Presto >
Security & Governance
Bring the security and governance of data warehouses to the data lakehouse with technologies like AWS Lake Formation or Apache Ranger – you select which works best for your needs. Define access control policies down to the row level, enabling you to handle sensitive data.
See how to enable AWS Lake Formation with Presto >
The catalog describes all of the data that’s stored in your system to make it usable, so you can analyze it to create dashboards and reports. Having a catalog like AWS Glue or Hive Metastore or an open source option like Amundsen in your Open Data Lakehouse is critical.
See how to configure Hive Metastore in your Open Data Lakehouse >
Making it easy: SaaS for Presto
Get a powerful SQL query engine as SaaS for your data lakehouse. Ahana is a managed service for Presto that’s simple to use and cost-effective.
Getting Started with your Data Lakehouse in AWS
Ready to start building your Open Data Lakehouse in AWS? Here are some resources to get you started.
Schedule a Demo
We’ll show you how to migrate data warehouse workloads or build a data lakehouse from scratch.
Customer use case
See why Blinkit moved from a data warehouse to a Data Lakehouse
Learn how the a Data Lakehouse is modernizing the data analytics stack
Free Presto trial – Get Started with a Data Lakehouse in AWS
No credit card required