Enabling spill to disk for optimal price per performance

Ahana

Presto was born out of the need for low-latency interactive queries on large scale data, and hence, continually optimized for that use case. In such scenarios, the best practice is to properly size Presto Worker nodes such that all the aggregate cluster memory can fit all the data required for target data sources, queries, and concurrency level. In addition, to ensure fairness of memory across queries and prevent deadlocks, by default, Presto will kill queries that exceed configured memory limits.

The Case for Spill to Disk

As Presto usage and adoption continues to grow, it is being used for more and more different use cases. For some of these use cases, the need for full memory bandwidth and low-latency is not necessary. For example, consider long running queries on large historical data, such as logs, where low latency results is not paramount. In these cases, it may be acceptable, and even more optimal overall, to trade some performance for cost savings. One way to achieve this is, of course, to use lower memory Presto Workers. However, perhaps these longer batch workloads where higher latency is tolerable is not the norm, but the minority case. Enter Presto spill to disk functionality where Presto can be configured to spill intermediate data from memory to disk when needed. While queries that spill to disk have longer execution times compared to an entire in-memory equivalent, the query will not fail due to exceeding configured memory properties.

Cost Savings of Spill to Disk

Let’s walk through a practical example of a spill to disk scenario. A 15 Worker Presto cluster of r5.2xlarge instances (64 GB memory, 8 vCPU, $0.5 per hour) in AWS costs about $180 per day with an aggregate cluster memory of close to 1 TB (960 GB actual). Instead of a 15 Worker Presto cluster, if we had a cluster with 30% less Presto Workers at 10 nodes, we would be decreasing the aggregate cluster memory by 320 GB. But, let’s say with augment the cluster with 1 TB (> 3x of 350 GB) of disk storage across all the remaining 10 nodes (100 GB per node) to leverage Presto disk spilling. At $0.10 per GB-month for gp2 EBS volumes, the storage costs is only $100 per month. The storage costs is less than a 1% of the memory costs, even with a 3x factor.

Spill to Disk Configuration

There are several configuration properties that need to be set to use spill to disk and they are documented in the Presto documentation. Here is an example configuration with 50 GB of storage allocated to each Worker for spilling.

experimental.spiller-spill-path=/path/to/spill/directory
experimental.spiller-max-used-space-threshold=0.7
experimental.max-spill-per-node=50GB
experimental.query-max-spill-per-node=50GB
experimental.max-revocable-memory-per-node=50GB
  • experimential.spiller-spill-path: Directoy where spilled content will be written.
  • experimental.spiller-max-used-space-threshold: If disk space usage ratio of a given spill path is above this threshold, the spill path will not be eligible for spilling.
  • experimental.max-spill-per-node: Max spill space to be used by all queries on a single node.
  • experimental.query-max-spill-per-node: Max spill space to be used by a single query on a single node.
  • experimental.max-revocable-memory-per-node: How much revocable memory any one query is allowed to use.

Conclusion

Several large scale Presto deployments take advantage of spill to disk, including Facebook. Today, the Ahana Cloud managed Presto service enables spill to disk by default and sets the per node spill to disk limit at 50 GB. We will be releasing the ability for customers to configure and tune their per node spill-to-disk size soon. Give it a try. You can sign up and start using our service today for free.