Presto queries running slow?
There are a few reasons that Presto performance is impacted and queries might be running slow. Below we’ll share some things to do in terms of diagnosis and Presto tuning, as well as possible solutions to address the issue. These can help with performance and Presto usage.
- How many workers do you have in your cluster? If your PrestoDB cluster has many (>50) workers then depending on workload and query profile, your single coordinator node could be overloaded. The coordinator node has many duties, like parsing, analyzing, planning and optimizing queries, consolidating results from the workers, task tracking and resource management. Add to that the burden of all the internal communication with the other nodes in the cluster being fairly heavyweight JSON over http and you can appreciate how things could begin to slow down at scale. (Note Presto projects like the “disaggregated coordinator” Fireball project aim to eliminate Presto’s single coordinator bottleneck). In the meantime try increasing the resources available to the Coordinator by running it on a larger cloud instance, as more CPU and memory could help. You may also run into issues if there’s Presto multiple users.
- Have you configured Presto and memory usage correctly? It is often necessary to change the default memory configuration based on your cluster’s capacity. The default max memory for a Presto server is 16 GB, but if you have a lot more memory available, you may want to allocate more memory to Presto for better performance. See https://prestodb.io/presto-admin/docs/current/installation/presto-configuration.html for configuration details. One rule of thumb: In each node’s jvm.config, set -Xmx to 80% of the available memory initially, then adjust later based on your monitoring of the workloads.
- What kind of instances are your worker nodes running on – do they have enough I/O? Picking the right kind of instance for worker nodes is important. Most analytical workloads are IO intensive so the amount of network IO available can be a limiting factor. Overall throughput will dictate query performance. Consider choosing higher Network IO instances for the workers – for example on AWS you can do this by looking at each instance type’s “network performance” rating – here are the ratings for the m4 type:
- Optimize your metadata / data catalog: Using Presto’s Hive connector for your metastore, like many users do, will mean practically every query will access the Hive metastore for table and partition details etc. During peak time that generates a high load on the metastore which can slow down query performance. To alleviate this consider:
- Setup multiple catalogs. Configure PrestoDB to use multiple thrift metastore endpoints – Presto’s Hive connector supports configuring multiple hive metastore endpoints which are tried in round-robin by the coordinator. https://prestodb.io/docs/current/connector/hive.html
- Enable Hive metastore and carefully tweak cache eviction configurations and TTLs suitable for your data refresh policies
- Do you have a separate coordinator node? With PrestoDB you can have a single node act as both a coordinator and worker, which can be useful for tiny clusters like sandboxes for testing purposes but it’s obviously not optimal in terms of performance. It is nearly always recommended to have the coordinator running on a separate node to the workers for anything other than sandbox use. Tip: Check your nodes’ Presto etc/config.properties files to determine which one is the coordinator (look for coordinator=true)
- Is memory exhausted? Presto uses an in-memory, pipelining processing architecture and its operation is dependent on the available JVM which in turn is dependent on how much memory Presto is configured to use and how much memory is physically available in the server or instance it is running in.
- The workers can be memory hungry when processing very large queries. Monitor their memory usage and look for failed queries. Allocate more memory if necessary and switch to using a more memory-rich machine if practical.
- The coordinator should be allocated a significant amount of memory – often more than a worker – depending on several factors like workload, the resources available, etc. It’s not uncommon to see the coordinator alone consuming several tens’ of GBs of memory.
- The good news is there is memory information available in at least two places:
- Presto’s built-in JMX catalog can help your monitor memory usage with various counters. Read more about memory pools, limits and counters at https://prestodb.io/blog/2019/08/19/memory-tracking.
- There is also the Presto Console which reveals, for each query, the reserved, peak and cumulative memory usage.
- When was the last time you restarted your Presto cluster? Sometimes, restarting any kind of software can solve all sorts of issues, including memory leaks and garbage collection.
- Is your Presto cluster configured for autoscaling based on CPU usage? If so check the configuration is what you expect it to be.
- Does IO and CPU utilisation look balanced? Check CPU usage on Presto workers: if their CPUs are not fully saturated, it might indicate the number of Presto worker threads can be made higher, or the number of splits in a batch is not high enough.
- Have you checked your data volumes recently? An obvious one to check but data volumes can grow in fits and starts and sometimes peaks occur unexpectedly. The queries may simply be taking longer because there is x% more data than last month.
Other configuration settings for Task concurrency, initial splits per node, join strategy, driver tasks… PrestoDB has around 82 system configurations and 50+ hive configuration settings which users can tweak, many at the query level. These are however for advanced users, which falls outside the scope of this article. More information can be found in the PrestoDB documentation.
As you can tell, there’s a lot to configure and tune when it comes to addressing Presto performance issues. To make it easier, you can use Ahana Cloud, SaaS for Presto. It’s available in the AWS Marketplace and is pay as you go. Check out our free trial at https://ahana.io/sign-up