DB Presto

Presto (aka db presto) is a high performance, distributed SQL query engine for big data. 

What it isn’t: Although the official github repo for the open source project under the auspices of The Presto Foundation is called PrestoDB, it is not a database. Can Presto replace a relational database? Users can’t store data in the db presto online, and it can’t replace relational databases such as PostgreSQL, Oracle, or MySQL. Presto stores intermediate data in its buffer cache, but it’s not meant to be used as a persistent storage layer.  Further, Presto is not designed to handle online transaction processing (OLTP).

What it is: Presto is an open source, distributed SQL query engine that’s best used for running interactive analytic queries on data sources of all sizes. Presto enables users to query data right at its location, whether the data is on Hive, Cassandra, relational databases or proprietary data stores. Users can combine data from multiple sources to be used in a single SQL query, making it possible to analyze data across the entire organization.

The PrestoDB project is community owned and driven and is supported by the Presto Foundation, which is an independent, nonprofit organization that’s hosted under the Linux Foundation. Founding members of the Presto Foundation include Facebook, Alibaba, Twitter and Uber. These members are helping drive the future direction of the project, with the goal of making Presto the most reliable SQL engine for massively distributed data processing.

Presto is particularly well-suited for platform teams who want to provide self-service analytics to their teams. It supports a broad range of use cases including ad hoc querying with SQL at any time, wherever the data resides; data lake analytics for querying directly on a data lake without the need for transformation; and federated querying, i.e. the ability to query data across multiple sources such as databases, data lakes, and more.

A full Presto deployment includes a coordinator and multiple workers. Analysts can submit their queries via a client such as the Presto CLI to the coordinator. The coordinator parses, analyzes and plans the query execution, then distributes the processing to the workers.