Presto Platform Overview
The Presto platform is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes. PrestoDB was developed from the ground up by the engineers at Meta. Currently, some of the world’s most well known, innovative and data-driven companies like Twitter, Uber, Walmart and Netflix depend on Presto for querying data sets ranging from gigabytes to petabytes in size. Facebook , as example, still uses Presto for interactive queries against several internal data stores, including their 300PB data warehouse. Over 1,000 Facebook employees use Presto daily to run more than 30,000 queries that in total scan over a petabyte each per day.
The Presto platform was designed and written from scratch for handle interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Airbnb or Twitter.
Presto allows users to effectively query data where it lives. This is including Hive, Cassandra, relational databases, HDFS, object stores, or even proprietary data stores. A single Presto query can combine data from multiple sources. This, in turn, allows for quick and accurate analytics across your entire organization. Presto is an in-memory distributed, parallel system.
Presto is targeted at data analysts and data scientists who expect response times ranging from sub-second to minutes. The Presto platform breaks the false choice between having fast analytics using an expensive commercial solution or using a slow “free” solution that requires excessive hardware. A single Presto query can combine data from multiple sources.
The Presto platform is composed of:
- Two types of Presto servers: coordinators and workers.
- One or more connectors: Connectors link Presto to a data source. Examples of such are Hive or a relational database. You can think of a connector the same way you think of a driver for a database.
- Cost Based Query Optimizer and Execution Engine. Parser. Planner. Scheduler.
- Drivers for connecting tools, including JDBC. The Presto-cli tool. The Presto Console.
In terms of organization the community owned and driven PrestoDB project is supported by the Presto Foundation, an independent nonprofit organization with open and neutral governance, hosted under the Linux Foundation®. Presto software is released under the Apache License 2.0.
Curious about how you can get going with the Presto platform? Ahana offers a managed service for Presto in the cloud. You can get started for free today with the Ahana Community Edition or a free trial for the full edition. The Ahana Community Edition is a free forever version of the Ahana Cloud managed service.
Have you been hearing the term “Open Data Lakehouse” more often? Learn what The Open Data Lake in the cloud actually is, and how it’s a solution to the massive data problem. Many companies are adopting that architecture because of better price-performance, scale, and non-proprietary architecture.
A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. Check out this article for more information about data warehouses including their strengths and weaknesses.
AWS Redshift is a completely managed cloud data warehouse service with the ability to scale on-demand. However, the pricing is not simple. Amazon Redshift tries to accommodate different use cases, but the pricing model does not fit all users. Learn more about the pricing of Amazon Redshift.
At its heart, Redshift is an Amazon petabyte-scale data warehouse product. Redshift is based on PostgreSQL version 8.0.2. Learn more about the pros and cons of Amazon Redshift.