Understanding Presto

PrestoDB, now hosted under the Linux Foundation, is an open source distributed SQL engine

We’ve curated this page for the community with the best reads from the PrestoDB community

PrestoDB Logo

Getting Started with Presto

Hands-on Guide

Community Slack

PrestoDB Github

PrestoDB Mailing List


Presto Overview

Understanding Presto. Let’s first note that it is a federated, distributed SQL query engine that runs on a cluster of machines. It enables interactive, ad-hoc analytics on large amounts of data. You run Presto SQL queries on your data. PrestoDB enables querying data where it lives, including Hive, AWS S3, Hadoop, Cassandra, relational databases, NoSQL databases, or even proprietary data stores. The Presto database open source engine allows users to access data from multiple sources, allowing for analytics across an entire organization.

PrestoDB Connections

A full Presto installation includes a coordinator and multiple workers. Queries are submitted from a client such as the Presto CLI to the coordinator. The coordinator parses, analyzes and plans the Presto query execution, then distributes the processing to the workers.

PrestoDB workers and connectors

Watch: Why’s PrestoDB so important and how does it help today’s user?

For a better understanding of Presto, hear from PrestoDB Foundation panel members Facebook, Uber, Ahana™, and Alibaba in a virtual roundtable.

PrestoDB Running in Production @





See more companies on prestodb.io >

Technical Concepts

Presto Server Types

Presto Coordinator: Responsible for parsing statements, planning queries, and managing Presto worker nodes.
Presto Worker: Responsible for executing tasks and processing data.

Prestodb server

Presto Data Sources

Connector: A connector adapts Presto to a data source such as Hive or a relational database. Presto contains several built-in connectors including JMX, a Hive connector, and a TPCH connector.
Catalog: A Presto catalog contains schemas and references a data source via a connector.
Schema: Use schemas to organize tables for querying.
Table: A set of unordered rows organized into named columns with types.

prestodb data sources

Query Execution Model

The Presto query engine executes SQL statements and turns these statements into queries that are executed across a distributed cluster of coordinator and workers.

Statement: When Presto parses a statement, it converts it into a query and creates a distributed query plan which is then realized as a series of interconnected stages running on Presto workers. 
Stage: When Presto executes a query, it does so by breaking up the execution into a hierarchy of stages. 
Task: Tasks are the “work horse” in the Presto architecture as a distributed query plan is deconstructed into a series of stages which are then translated to tasks which then act upon or process splits.
Split: Tasks operate on splits which are sections of a larger data set. Stages at the lowest level of a distributed query plan retrieve data via splits from connectors, and intermediate stages at a higher level of a distributed query plan retrieve data from other stages.
Driver: Drivers act upon data and combine operators to produce output that is then aggregated by a task and then delivered to another task in another stage.
Operator: An operator consumes, transforms and produces data.
Exchange: Exchanges transfer data between Presto nodes for different stages of a query.

query model

Common Use Cases

Ad hoc querying

Use SQL to run ad hoc queries whenever you want, wherever your data resides.

Data lake analytics

Query data directly on a data lake without the need for transformation.

Federated querying

Query data across multiple sources like databases, data lakes, and more