What is Presto? Presto and PrestoDB Explained

PrestoDB, now hosted under the Linux Foundation, is an open source distributed SQL engine

We’ve curated this page for the community with the best reads from the PrestoDB community to answer the question, What is Presto?

What is presto

Getting Started with Presto

Hands-on Guide

Join
PrestoDB
Community Slack

Contribute
PrestoDB Github

Join
PrestoDB Mailing List

Join
PrestoDB
Meetup

What is PrestoDB and Presto SQL? An Overview

What is Presto? As you are trying to build a better understanding of Presto, let’s first note that it is a federated, distributed SQL query engine that runs on a cluster of machines. It enables interactive, ad-hoc analytics on large amounts of data. PrestoDB enables querying data where it lives using SQL, including Hive, AWS S3, Hadoop, Cassandra, relational databases, NoSQL databases, or even proprietary data stores. The Presto database open source engine allows users to access data from multiple sources, allowing for analytics across an entire organization.

What is Presto SQL? SQL is a standard language used to perform operations on data. Presto SQL is an implementation of this standard language which provides efficient querying of data in data lakes or databases in a distributed manner.

Learn more about Presto architecture and Presto SQL.

PrestoDB integration

A full Presto installation includes a coordinator and multiple workers. Queries are submitted from a client such as the Presto CLI to the coordinator. The coordinator parses, analyzes and plans the Presto query execution, then distributes the processing to the workers.

Understanding presto

Watch: What is Presto & Why Is It So Important? See How It Helps Today’s Users?

For a better answer the question, what is Presto, hear from PrestoDB Foundation panel members Facebook, Uber, Ahana™, and Alibaba in a virtual roundtable.

PrestoDB Running in Production @

Facebook

Uber
twitter
alibaba

AWS

Google

See more companies on prestodb.io >

Ready to answer the question: What is Presto?

Chat With an Engineer to See Presto in use

Technical Concepts

Presto Server Types

Presto Coordinator: Responsible for parsing statements, planning queries, and managing Presto worker nodes.
Presto Worker: Responsible for executing tasks and processing data.

Prestodb server

Presto Data Sources

Connector: A connector adapts Presto to a data source such as Hive or a relational database. Presto contains several built-in connectors including JMX, a Hive connector, and a TPCH connector.
Catalog: A Presto catalog contains schemas and references a data source via a connector.
Schema: Use schemas to organize tables for querying.
Table: A set of unordered rows organized into named columns with types.

prestodb data sources

Query Execution Model

The Presto query engine executes SQL statements and turns these statements into queries that are executed across a distributed cluster of coordinator and workers.

Statement: When Presto parses a statement, it converts it into a query and creates a distributed query plan which is then realized as a series of interconnected stages running on Presto workers. 
Stage: When Presto executes a query, it does so by breaking up the execution into a hierarchy of stages. 
Task: Tasks are the “work horse” in the Presto architecture as a distributed query plan is deconstructed into a series of stages which are then translated to tasks which then act upon or process splits.
Split: Tasks operate on splits which are sections of a larger data set. Stages at the lowest level of a distributed query plan retrieve data via splits from connectors, and intermediate stages at a higher level of a distributed query plan retrieve data from other stages.
Driver: Drivers act upon data and combine operators to produce output that is then aggregated by a task and then delivered to another task in another stage.
Operator: An operator consumes, transforms and produces data.
Exchange: Exchanges transfer data between Presto nodes for different stages of a query.

query model

Common Use Cases

on demand analytics

Ad hoc querying

Use SQL to run ad hoc queries whenever you want, wherever your data resides.

unlock data lakes

Data lake analytics

Query data directly on a data lake without the need for transformation.

query data

Federated querying

Query data across multiple sources like databases, data lakes, and more