Presto Cluster Autoscaling
Tip: if you are looking for a more thorough understanding of using PrestoDB check out the free technical ebook, Learning and Operating Presto.
We are beyond excited to announce that autoscaling is now available on Ahana Cloud and will assist you with maintaining a Presto cluster. In this initial release, the autoscaling feature will assist to monitor the worker nodes’ average CPU Utilization of your PrestoDB worker nodes and scale-out when reaching the 75% threshold. In addition to this, Presto clusters have now the ability to scale-in to a minimum number of worker nodes when the cluster is idle for a user-specified amount of time.
Never run out of memory with autoscaling
One of the challenges of running a Presto cluster is to make the right decision in terms of the number of worker nodes required to run your queries. Not all queries are equals and predicting how many nodes will be required is not always possible. With the scale-out feature, the number of worker nodes increases based on the CPU utilization to ensure that your queries can execute without running out of memory. That way you don’t have to worry about whether your deployment can support your requirements. Future iterations will include scale-in based on CPU utilization and autoscaling based on additional metrics.
Save cost with Idle time
When no queries are sent to a Presto cluster, it would make sense to reduce or condense the number of workers nodes but it’s not always practical to do so manually. With the Idle time feature enabled, the system will monitor the queries activity, if no activity is detected for a user-defined period of time, let’s say 15mins, then the number of worker nodes will reduce to its minimum count.
Two of the most common use cases we have found that benefit greatly from idle time cost saving are transformation workloads and ad hoc querying.
- For the transformation workload, the query can potentially run for several hours, making it unpractical to monitor its activity to decide when to manually stop the cluster or reduce the number of running nodes. Idle time cost savings wait for a certain period of inactivity and then reduce the worker node to a minimum automatically until the next query hits the cluster again.
- For ad hoc querying, like its name suggests, the querying is not continuous and scaling in to a minimum worker node count between each queries will help reduce costs.
Enabling autoscaling with a Presto Cluster
Getting started with autoscaling a Presto cluster is easy. We’ll demonstrate just how simply it is with this step-by-step walkthrough.
Step 1 – First, in your Cluster settings select Scale Out only (CPU) scaling strategy
Step 2 – Enter a Minimum and a Maximum worker node count as well as a Scale Out step size. The scale-out step size will decide how many nodes get added to the cluster when the scaling out triggers
Step 3 – By default, the cluster will resize to its minimum worker node count defined above after 30mins, this can be set between 10mins and 1 hour
Your new Presto cluster will scale-out up to its maximum worker node count as long as the average CPU Utilization of the worker nodes goes beyond 75%. However, if no queries reach the cluster for a default period of 30mins then the cluster will reduce its worker node count to its minimum.
Enabling Idle time cost saving
Enabling Idle time cost saving is very easy with this step-by-step walkthrough.
As shown in the section above, idle time cost saving is enabled by default in the Scale Out only (CPU) scaling strategy.
For the Static cluster, to enable the feature, you will need to do the following:
Step 1 – Check Scale to a single worker node when idle
Step 2 – By default, the cluster will resize to a single worker node after 30mins, this can be set between 10 mins and 1 hour.
Changing the autoscaling configuration of an existing cluster
You can always change the configuration after a cluster got created by following the steps below:
Step 1 – Navigate to the cluster details view
Step 2 – Edit the cluster scaling policy configuration
Step 3 – The server will update its configuration immediately after clicking the Save button
Amazon Redshift is a cloud data warehouse, permitting the execution of SQL queries, offered as a managed service by AWS. Learn more about what it is and how it differs from traditional data warehouses.
At its heart, Redshift is an Amazon petabyte-scale data warehouse product that is based on PostgreSQL version 8.0.2. Users can easily run SQL queries on Redshift, but there are some limitations.
Are you using Presto and want more information to simplify your work processes? Start here. Learn how to connect Superset to Presto. This article will break down the steps needed to use Presto with Superset and get you moving faster.