How Much Memory To Give A Presto Worker Node
Presto is an in-memory query engine and so naturally memory configuration and management is important. A common question that comes up is how much memory should I give a worker node?
JVM Memory
Presto’s JVM memory config nearly always needs to be configured – you shouldn’t be running Presto with its default setting of 16GB of memory per worker/coordinator. That’s the Presto max.
The Xmx flag specifies the maximum memory allocation pool for a Java virtual machine. Change the -Xmx16G
in the jvm.config
file to a number based on your cluster’s capacity, and number of nodes. See https://prestodb.io/presto-admin/docs/current/installation/presto-configuration.html on how to do this.
Rule of Thumb
It is recommended you set aside 15-20% of total physical memory for the OS. So for example if you are using EC2 “r5.xlarge” instances which have 32GB of memory, 32GB-20% = 25.6GB so you would use -Xmx25G
in the jvm.config
file for coordinator and worker (or -Xmx27G
if you want to go with 15% for the OS).
This is assuming there are no other services running on the server/instance, so maximum memory can be given to Presto.
Presto Memory
Like with JVM above, there are two memory related settings that you should check before starting Presto. For most workloads Presto’s other memory settings will work perfectly well when left at their defaults. There are configurable parameters that control memory allocation that could be useful for specific workloads however. The practical guidelines below will 1) help you decide if you need to change your Presto memory configuration, and 2) which parameters to change.
Workload Considerations
You may want to change Presto’s memory configuration to optimise ETL workloads versus analytical workloads, or for high query concurrency versus single-query scenarios. There’s a great in-depth blog on Presto’s memory management written by one of Presto’s contributors at https://prestodb.io/blog/2019/08/19/memory-tracking which will guide you in making more detailed tweaks.
Configuration Files & Parameters
When first deploying Presto there are two memory settings that need checking. Locate the config.properties files for both the coordinator and worker. There are two important parameters here: query.max-memory-per-node
and query.max-memory
. Again see https://prestodb.io/presto-admin/docs/current/installation/presto-configuration.html for rules-of-thumb and how to configure these parameters based on the available memory.
The above guidelines and links should help you when considering how much memory should you give a worker node. You can see there’s a lot of tuning and config’s to manage – over 200. To avoid all memory configuration work you can use Ahana Cloud for Presto – a fully managed service for Presto, that needs zero configuration. Sign up for a free trial at https://ahana.io/sign-up.