Presto (and PrestoDB) Architecture and Design

The basics

The Presto architecture and design seamlessly integrates with an existing data ecosystem without needing to modify it. It’s an additional faster data access interface that turbocharges your existing stack. Presto separates compute from storage; its federated query engine lets teams run analytics for a variety of use cases across a wide range of data lakes and databases in-place, with no need to move data. It doesn’t store the data, which has the big advantage of being able to scale resources for queries up and down based on the demand. 

Presto supports a decoupled architecture

Most of the cloud deployments leverage object storage, which is already separated from the compute layer, and auto-scale to optimize resource costs. So the separation of compute and storage make it very suitable for cloud environments–though Presto can be deployed both in the cloud and on premises. 

Learn more about how Presto works with Amazon S3.

Presto nodes and how PrestoDB handles querying

Presto’s architecture is very similar to a classic database management system using cluster computing (MPP). Presto’s architecture consists of one coordinator node working together with multiple worker nodes. Clients enter their SQL queries that then are parsed, analyzed and planned with parallel tasks scheduled out to workers. Workers jointly process rows from the data sources and produce results that the client receives back. Presto does not write intermediate results to disk as its precursor Apache Hive did, and so you get significantly faster query speed. For Presto programming, it’s written in the Java programming language.

Presto data sources and connectors

A single Presto query can combine data from multiple sources. Presto uses connectors to access various data sources, some of the more popular combinations include Presto HDFS, Presto Amazon S3, Presto MySQL, Presto PostgreSQL, Presto Microsoft SQL Server, Presto Redis, Presto Kudu, Amazon Redshift, Apache Phoenix, Apache Kafka, Apache Cassandra, Apache Accumulo, MongoDB and Redis. A connector is similar in concept to a driver for a database. Several of the connectors are built-in to Presto, namely for Hive, JMX, TPC-H benchmark data, and a system connector which provides access to built-in system tables. Presto accesses other data sources using third-party connectors, and more and more are becoming available.