Presto Use Cases
No matter which version you’re using, Presto is used for a wide range of analytical use cases and is particularly good for interactive and ad hoc querying.
PrestoDB is for everyone!
It doesn’t matter if you’re operating at Facebook-like scale or at just a few nodes – PrestoDB can help with your interactive and ad hoc querying.
How Presto is used at Facebook
In 2012, Facebook developed and deployed Presto. Today they’re using Presto to power many different use cases, including:
Interactive analytics: Quickly analyze data and execute queries in seconds or minutes using visualizations, dashboards, notebooks, and BI tools.
Batch ETL: Populate and process data in the warehouse for better throughput and efficiency.
A/B testing: Choose which data to slice and dice and evaluate product changes via hypothesis testing, with results in hours (not days).
App Analytics: Build external-facing custom reporting tools with low latency and hundreds of highly available concurrent queries.
More Common Use Cases
Ad hoc querying
Use SQL to run ad hoc queries whenever you want, wherever your data resides. Presto allows you to query data where it’s stored so you don’t have to ETL data into a separate system. With Presto connectors and their in-place execution, platform teams can quickly provide access to datasets that analysts have interest in. Along with that access comes the power of Presto to run queries in seconds instead of hours, and analysts can iterate quickly on innovative hypotheses with the interactive exploration of any dataset, residing anywhere.
Query data across multiple sources to build reports and dashboards for self-service business intelligence. Presto gives data scientists and analysts the ability to query data across sources on their own so they’re not dependent on data platform engineers.
Reporting & dashboarding
Transformation using SQL (ETL)
Aggregate terabytes of data across multiple data sources and run efficient ETL queries against that data with Presto. Instead of legacy batch processing systems, use Presto to run resource- efficient and high throughput queries.
Query data directly on a data lake without the need for transformation. Presto enables you to query any type of data in your data lake, including both structured and unstructured data.
Data lake analytics
Federated querying across multiple data sources
Query data across many different data sources including databases, data lakes, lake houses, on-premises or in the cloud. Presto allows you to aggregate answers back in the Presto in-memory database.
Real world examples from the community
Presto powers analytics on Uber’s data lake and is so widely used that about half the company logs into Presto at least once a month to do something that’s important for their work.
Watch the 1 minute video from the Presto virtual roundtable.
Jampp is a mobile app marketing platform that uses programmatic ads to acquire new users and retarget those users with relevant ads. It sits between advertisers and their audiences, so real time bidding of media advertising space is critical for their business. The amount of data Jampp generates as part of the bidding cycle is massive – 1.7B events are tracked per day, 550K/sec requests are received, and 100TB of data is processed by AWS elastic load balancers per day. PrestoDB plays a critical role in their data infrastructure. Jampp relies on EMR Presto for their ad hoc queries and performs over 3K ad hoc queries/day on over 600TB of queryable data.
Alibaba leverages Presto for data lake analytics to provide serverless, federated analytics.
Watch the 1 minute video from the Presto virtual roundtable
Twitter developed a highly-scalable query predictor service through applying machine learning algorithms to ~10 million historical Presto queries to classify queries based on their CPU times and peak memory bytes. This service is helping to improve the performance of Presto clusters and provide expected execution statistics on Business Intelligence dashboards.
LinkedIn extended its Presto Hive Catalog with a smart logical abstraction layer that is capable of reasoning about logical views with UDFs by using two core components, Coral and Transport UDFs. Coral is a view virtualization library, powered by Apache Calcite, that represents views using their logical query plans.
LinkedIn leverages Coral abstractions to decouple view expression language from the execution engine, and hence execute non-Presto-SQL views inside Presto, and achieve on-the-fly query rewrite for data governance and query optimization.