AWS Redshift Query Limits
What is Amazon Redshift?
At its heart, Redshift is an Amazon petabyte-scale data warehouse product that is based on PostgreSQL version 8.0.2. It has evolved and been enhanced since then into a powerful distributed system that can provide speedy results across millions of rows. Conceptually it is based on node clusters, with a leader node and compute nodes. The leader generates the execution plan for queries and distributes those tasks to the compute nodes. Scalability is achieved with elastic scaling that can add/modify worker nodes as needed and quickly. We’ll discuss the details in the article below.
Limitations of Using Amazon Redshift
There are of course Redshift limitations on many parameters, which Amazon refers to as “quotas”. There is a Redshift query limit, a database limit, a Redshift query size limit, and many others. These have default values from Amazon and are per AWS region. Some of these quotas can be increased by submitting an Amazon Redshift Limit Increase Form. Below is a table of some of these quota limitations.
|Nodes per cluster||128||Yes|
|Nodes per region||200||Yes|
|Schemas per DB per cluster||9,900||No|
|Tables per node type||9,900 – 100,000||No|
|Databases per cluster||60||No|
|Stored procedures per DB||10,000||No|
|Query size limit||100,000 rows||Yes|
|Correlated Subqueries||Need to be rewritten||No|
AWS Redshift Performance
To start, Redshift is storing data in compressed, columnar format. This means that there is less area on disk to scan and less data that has to be moved around. Add to that indexing and you have the base recipe for high performance. In addition, Redshift maintains a results cache, so frequently executed queries are going to be highly performant. This is aided by the query plan optimization done in the leader node. Redshift also optimizes the data partitioning in a highly efficient manner to complement the optimizations done in the columnar data algorithms.
There are a robust number of scaling strategies available from Redshift. With just a few clicks in the AWS Redshift console, or even with a single API call, you can change node types, add nodes and pause/resume the cluster. You are also able to use Elastic Resize to dynamically adjust your provisioned capacity within a few minutes. A Resize Scheduler is also available where you can schedule changes, say for month-end processing for example. There is also Concurrency Scaling that can automatically provision additional capacity for dynamic workloads.
A lot of variables go into Redshift pricing depending on the scale and features you go with. All of the details and a pricing calculator can be found on the Amazon Redshift Pricing page. To give you a quick overview, however, prices start as low as $.25 per hour. Pricing is based on compute time and size and goes up to $13.04 per hour. Amazon provides some incentives to get you started and try out the service.
First, similar to the Ahana Cloud Commnity Edition, Redshift has a “Free Tier”, if your company has never created a Redshift cluster then you are eligible for a DC2 large node trial for two months. This provides 750 hours per month for free, which is enough to continuously run that DC2 node, with 160GB of compressed SSD storage. Once your trial expires or your usage exceeds 750 hours per month, you can either keep it running with their “on-demand” pricing, or shut it down.
Next, there is a $500 credit available to use their Amazon Redshift Serverless option if you have never used it before. This applies to both the compute and storage and how long it will last depends entirely on the compute capacity you selected, and your usage.
Then there is “on-demand” pricing. This option allows you to just pay for provisioned capacity by the hour with no commitments or upfront costs, partial hours are billed in one-second increments. Amazon allows you to pause and resume these nodes when you aren’t using them so you don’t continue to pay, and you also preserve what you have, you’ll only pay for backup storage.
Redshift provides a robust, scalable environment that is well suited to managing data in a data warehouse. Amazon provides a variety of ways to easily give Redshift a try without getting too tied in. Not all analytic workloads make sense in a data warehouse, however, and if you are already landing data into AWS S3, then you have the makings of a data lakehouse that can offer better price/performance. A managed Presto service, such as Ahana, can be the answer to that challenge.