What’s Presto?

Have you been wondering what’s Presto? Presto is an open source, distributed SQL query engine that’s best used for running interactive analytic queries on data sources of all sizes. Presto is particularly optimized for low latency and interactive query analysis. It can easily scale from gigabytes to petabytes with zero downtime. It approaches the speed of commercial data warehouses and can scale to organizations as big as Facebook.

Presto makes it possible for users to query data right at its location, whether the data is on Hive, Cassandra, relational databases or even proprietary data stores. Users can combine data from multiple sources to be used in a single query, making it possible to analyze data across the entire organization.

Presto is ideally suited for analysts who want response times ranging from sub-seconds to minutes. Analysts no longer have to choose between fast analytics using an expensive commercial solution or using a slower, less expensive tool that requires excessive hardware.

A full Presto deployment includes a coordinator and multiple workers. Analysts can submit their queries via a client such as the Presto CLI to the coordinator. The coordinator parses, analyzes and plans the query execution, then distributes the processing to the workers. 

The Presto project that is community owned and driven is supported by the Presto Foundation, which is an independent, nonprofit organization that’s hosted under the Linux Foundation. Founding members of the Presto Foundation include Facebook, Alibaba, Twitter and Uber. These members are helping drive the future direction of the project, with the goal of making Presto the fastest and most reliable SQL engine for massively distributed data processing.

Presto supports pluggable connectors that provide data for queries. The requirements vary by connector. Connector examples include: Hive for HDFS or Object Stores (S3), MySQL, ElasticSearch, Cassandra, Kafka and more. So that’s what’s Presto.