Presto Performance: Speed, Optimization & Tuning
Presto is an open source distributed ANSI SQL query engine for analytics. Presto supports the separation of compute and storage (i.e. it queries data that is stored externally – for example in Amazon S3 or in RDBMSs). Performance is particularly important for ad-hoc analytical queries so Presto has a number of design features to maximize Presto speed such as in-memory pipelined execution, a distributed scale-out architecture, and massively parallel processing (MPP) design. In terms of specific performance features Presto supports:
- Compression (SNAPPY, LZ4, ZSTD, and GZIP)
- Table statistics – collected by the ANALYZE command and stored in a Hive or Glue metastore – give Presto’s query planner insights into the shape, size and type of data being queried, and whether or not the data source supports pushdown operations like filters and aggregates.
- Presto uses a cost-based optimizer, which as you would often expect depends on collected table statistics for optimal functioning.
As Presto is an in-memory query engine, it can only process data as fast as the storage layer can provide it. There are MANY different types of storage that can be queried by Presto, some faster than others. So if you can choose the fastest data source this will boost Presto’s speed. This may involve tuning the source to reduce latency, increase throughput, or both. Or switching from accessing a data source that is busy dealing with lots of users and therefore high levels of contention, to an alternative – perhaps a read replica of a database. Or creating indexes, or accessing a pre-aggregated version of the data. Or perhaps moving portions of frequently used data from object storage to a faster storage layer like a RDBMS in order to meet a strict query SLA. Other suggestions include switching to one of Presto’s supported file formats that features performance optimizations like ORC or Parquet, and consider enabling compression.
Presto Performance Tips
The following recommendations can help you achieve maximum performance from your Presto clusters:
- Configure Presto’s coordinator and workers to run on separate instances/servers in production deployments. It is only recommended to have the coordinator and worker share the same instance for very small scale dev/test use.
- Always adjust Presto’s java memory configuration according to the available memory resources of your instance. There is no “default”, so the etc/jvm.config file on each node needs to be configured before your start Presto. A useful rule of thumb is: In each node’s jvm.config set -Xmx to 80% of the available physical memory initially, then adjust later based on your monitoring of the workloads.
- If using HDFS or S3 storage for example, consider using ORC format for your data files. There are numerous optimisations in Presto for ORC such as columnar reads (Presto is a columnar query engine), predicate pushdown, and skipping reads of portions of files if the data is sorted (or mostly sorted) on the filtering columns.
- Use partitioning. You can create a partitioned version of a table with a CTAS https://prestodb.io/docs/current/sql/create-table-as.html by adding the partitioned_by clause to the CREATE TABLE.
- Use bucketing. Do this by adding the bucketed_by clause to your CREATE TABLE statement. You will also need to specify bucket_count.
- If you have a choice of metastore, choose Hive instead of Glue for your Presto cluster(s). Hive has some features that Glue does not, like column-level statistics and dynamic filtering which can boost query performance. The final decision will depend on your particular mix of systems and feature requirements. Here’s a summary of the differences:
- Collect table statistics to ensure the most efficient query plan is produced, which means queries run as fast as possible. Use the sql ANALYZE TABLE <tablename> command to do this. Repeat the ANALYZE TABLE commands for all tables involved in queries on a regular basis, typically when data has substantially changed (e.g. new data arrived / after an ETL cycle has completed).
- In conjunction with the above, if you are exploiting partitioning, make sure you update the partitioning information that’s stored in your metastore. For example to sync the metastore with the partitioning in the table default.customer use CALL system.sync_partition_metadata(‘default’, ‘customer’, ‘full’); Do this right after you create the table, and repeat this when new partitions are added.
- Presto does not do automatic JOIN re-ordering by default. This only happens when the cost-based optimisation (CBO) feature is enabled (see below). By default you (or the application) need to make sure that smaller tables appear on the right side of the JOIN keyword. Remember: “LARGE LEFT” (put the large table on the left side of the join).
- If your queries are performing table joins you should try enabling the cost-based optimisation (CBO) feature – use:
- SET session join_distribution_type=’AUTOMATIC’;
- SET session join_reordering_strategy=’AUTOMATIC’;
(both SETs are needed – and are persisted until session has ended/log out).
- You should enable Dynamic Filtering when 1 or more joins are in-play, especially if there’s a smaller dimension table being used to probe a larger fact table for example. Dynamic Filtering is pushed down to ORC and Parquet readers, and can accelerate queries on partitioned as well as non-partitioned tables. Dynamic Filtering is a join optimization intended to improve performance of Hash JOINs. Enable this with:
- SET session enable_dynamic_filtering=TRUE;
- SET session enable_dynamic_filtering=TRUE;
- If practical, try ordering/sorting your tables during ingestion. This can greatly improve performance – especially the effectiveness of Dynamic Filtering.
- Monitor for Coordinator node overload. If your PrestoDB cluster has many (>50) workers then depending on workload and query profile, your single coordinator node could be overloaded. The coordinator node has many duties, like parsing, analysing, planning and optimising queries, consolidating results from the workers, task tracking and resource management. Add to that the burden of all the internal communication with the other nodes in the cluster being fairly heavyweight JSON over http and you can appreciate how things could begin to slow down at scale. (Note Presto projects like the “disaggregated coordinator” Fireball project aim to eliminate Presto’s single coordinator bottleneck). In the meantime try increasing the resources available to the Coordinator by running it on a larger cloud instance, as more CPU and memory could help.
- Choose the right instances for your workers to ensure they have enough I/O. Picking the right kind of instance for worker nodes is important. Most analytical workloads are IO intensive so the amount of network IO available can be a limiting factor. Overall throughput will dictate query performance. Consider choosing higher Network IO instances for the workers – for example on AWS you can do this by looking at each instance type’s “network performance” rating – here are the ratings for the m4 instances:
It’s a good idea to monitor the IO activity of your worker nodes to determine if there’s an IO bottleneck.
- Consider enabling Resource Groups. This is Presto’s workload manager and it’s used to place limits on resource usage, and can enforce queueing policies on queries that run within them or divide their resources among sub-groups. A query belongs to a single resource group, and consumes resources from that group (and its ancestors). A resource group represents the available Presto resources packaged together, with limits related to CPU, memory, concurrency, queueing, and priority. Except for the limit on queued queries, when a resource group runs out of a resource it does not cause running queries to fail; instead new queries become queued. A resource group may have sub-groups or may accept queries, but may not do both. More details in the documentation: https://prestodb.io/docs/current/admin/resource-groups.html
Finally, to assist with speed tests Presto has TPC-DS and TPC-H catalogs built-in to generate data for benchmarking purposes at varying scales. For example the 1TB TPC-H dataset consists of approximately 8.66 billion records, in 8 tables.
Need more speed? Deploy Presto in the cloud and then scale your cluster to as many or as few instances as you need, when you need them. This “elasticity” is being made increasingly automatic by Ahana’s fully managed cloud-native Presto service.
We hope these Presto performance recommendations help you get the most out of Presto. If you’re interested in getting started with Presto, check out the Ahana Cloud platform, a managed service for Presto in AWS.