How to Make The Presto Query Engine Run Fastest
How do you make the Presto query engine run as fast as possible? The following list of things to consider, check and tune will help you get the most out of your Presto clusters:
- How many workers do you have in your cluster? If your PrestoDB cluster has many (>50) workers then depending on workload and query profile, your single coordinator node could be overloaded. The coordinator node has many duties, like parsing, analysing, planning and optimising queries, consolidating results from the workers, task tracking and resource management. Add to that the burden of all the internal communication with the other nodes in the cluster being fairly heavyweight JSON over http and you can appreciate how things could begin to slow down at scale. (Note Presto projects like the “disaggregated coordinator” Fireball project aim to eliminate Presto’s single coordinator bottleneck). In the meantime try increasing the resources available to the Coordinator by running it on a larger server/cloud instance, as more CPU and memory could help.
- Have you configured Presto and memory usage correctly? It is often necessary to change the default memory configuration based on your cluster’s capacity. The default max memory for a Presto server is 16 GB, but if you have a lot more memory available, you may want to allocate more memory to Presto for better performance. See https://prestodb.io/presto-admin/docs/current/installation/presto-configuration.html for configuration details. One rule of thumb: In each node’s jvm.config, set -Xmx to 80% of the available memory initially, then adjust later based on your monitoring of the workloads.
- What kind of instances are your worker nodes running on – do they have enough I/O? Picking the right kind of instance for worker nodes is important. Most analytical workloads are IO intensive so the amount of network IO available can be a limiting factor. Overall throughput will dictate query performance. Consider choosing higher Network IO instances for the workers – for example on AWS you can do this by looking at each instance type’s “network performance” rating – here are the ratings for the m4 type:
- Optimize your metadata / data catalog: Using Presto’s Hive connector for your metastore, like many users do, will mean practically every query will access the Hive metastore for table and partition details etc. During peak time that generates a high load on the metastore which can slow down query performance. To alleviate this consider:
- Setup multiple catalogs. Configure PrestoDB to use multiple thrift metastore endpoints – Presto’s Hive connector supports configuring multiple hive metastore endpoints which are tried in round-robin by the coordinator. https://prestodb.io/docs/current/connector/hive.html
- Enable Hive metastore and carefully tweak cache eviction configurations and TTLs suitable for your data refresh policies
- Do you have a separate coordinator node? With PrestoDB you can have a single node act as both a coordinator and worker, which can be useful for tiny clusters like sandboxes for testing purposes but it’s obviously not optimal in terms of performance. It is nearly always recommended to have the coordinator running on a separate node to the workers for anything other than sandbox use. Tip: Check your nodes’ Presto etc/config.properties files to determine which one is the coordinator (look for coordinator=true)
- Is memory exhausted? Presto uses an in-memory, pipelining processing architecture and its operation is dependent on the available JVM which in turn is dependent on how much memory Presto is configured to use and how much memory is physically available in the server or instance it is running in.
- The workers can be memory hungry when processing very large queries. Monitor their memory usage and look for failed queries. Allocate more memory if necessary and switch to using a more memory-rich machine if practical.
- The coordinator should be allocated a significant amount of memory – often more than a worker – depending on several factors like workload, the resources available, etc. It’s not uncommon to see the coordinator alone consuming several tens’ of GBs of memory.
The good news is there is memory information available in at least two places:
1. Presto’s built-in JMX catalog can help your monitor memory usage with various counters. Read more about memory pools, limits and counters at https://prestodb.io/blog/2019/08/19/memory-tracking.
2. There is also the Presto Console which reveals, for each query, the reserved, peak and cumulative memory usage.
- When was the last time you restarted your Presto cluster? Sometimes, restarting any kind of software can solve all sorts of issues, including memory leaks and garbage collection.
- Examine query EXPLAIN query plans. If you have a slow running query then check the query plan by prefixing your query with EXPLAIN or EXPLAIN ANALYZE https://prestodb.io/docs/current/sql/explain-analyze.html
- Is your Presto cluster configured for autoscaling based on CPU usage? If so check the configuration is what you expect it to be.
- Does IO and CPU utilisation look balanced? Check CPU usage on Presto workers: if their CPUs are not fully saturated, it might indicate the number of Presto worker threads can be made higher, or the number of splits in a batch is not high enough. You can check these stats in the Presto Console’s UI “Cluster Overview”.
- Have you checked your data volumes recently? An obvious one to check but data volumes can grow unexpectedly and sometimes growth peaks occur without you noticing. The queries may simply be taking longer because there is x% more data than last month.
Other configuration settings for Task concurrency, initial splits per node, join strategy, driver tasks… PrestoDB has around 82 system configurations and 50+ hive configuration settings which users can tweak, many at the query level. These are however for advanced users, which falls outside the scope of this article. More information can be found in the PrestoDB documentation.
We hope the above information makes your Presto clusters run as fast as they can.