what is amazon redshift

The Fundamental Problems with Amazon Redshift

In the last article we discussed the Difference Between Redshift and Redshift Spectrum, in this article let’s understand the problems with Amazon Redshift and some of the available alternatives for Amazon Redshift.

Amazon Redshift made it easy for anyone to implement data warehouse use cases in the cloud. However, It is unable to provide the same benefits to newer, more advanced cloud data warehouses. When Redshift was a relatively new technology, everyone was going through a learning curve.

Here are some of the fundamental Problems with Amazon Redshift:

AWS Redshift’s Cost

Amazon Redshift is a traditional MPP platform where the compute is closely integrated with the storage. The advantage of the cloud is that theoretically compute and storage are completely independent of each other, and storage is virtually unlimited. you want more storage with Redshift you will have to purchase more compute power. As data volumes increase, the cost of storage and compute in the warehouse becomes challenging. AWS Redshift and Redshift Spectrum come with a premium cost, especially if you use Spectrum outside of AWS Redshift.This makes Amazon Redshift one of the most expensive cloud data warehouse solutions.

Vendor lock-in with Redshift

Data warehouse vendors, like AWS Redshift, make it difficult to use your data outside of their services. Data would need to be pulled out of the warehouse and duplicated, further driving up compute costs.

Proprietary data formats

Data architects, data engineers, and analysts are required to use the data format supported by the data warehouse. No flexible or open data formats available.

No Staging Area in Redshift

It is expensive to host data in Amazon Redshift so duplication of data has to be avoided at all cost. In traditional RDBMS systems, we tend to have landing, staging layers and warehouse layers in the same database. But for Amazon Redshift, the landing and staging layer has to be on S3. Only the data on which reports/analytics will be built should be loaded in Redshift on a need basis and can’t keep the entire dataset in Redshift

No Index support in Amazon Redshift

Redshift doesn’t support indexes like other data warehouse systems hence Redshift is designed to perform the best when you select only the columns that you absolutely need to query. As Amazon Redshift is columnar storage, a construct called Distribution Key needs to be used which is nothing but a column based on which data is distributed across different nodes of the Redshift cluster.

Manual house-keeping

Performance based issues that need to be handled in proper maintenance like Vacuum and Analyze, SORT Keys, Compressions, Distribution styles, etc. 

Tasks like VACUUM and ANALYZE need to be run regularly which are expensive and time consuming tasks. There’s no good frequency to run this that suits all. This requires a quick cost-benefit analysis before deciding on the frequency.

Disk space capacity planning

Control over disk space is a must with Amazon Redshift especially when you’re dealing with analytical workloads. There are high chances you oversubscribe the system, and not just reduced disk space degrades the performance of the query but also makes it cost prohibitive. Having a cluster filled above 75% isn’t good for performance.

Concurrent query limitation

Above 10 concurrent queries, you start seeing issues. Concurrency scaling may mitigate queue times during bursts in queries. However, simply enabling concurrency scaling didn’t fix all of our concurrency problems. The limited impact is likely due to the constraints on the types of queries that can use concurrency scaling. For example, we have a lot of tables with interleaved sort keys, and much of our workload is writes.

Conclusion

These were some of the fundamental problems that you need to keep in mind while using Amazon Redshift. Also for more information about the AWS Redshift Query limitations check out this article.

Comparing AWS Redshift?

See how it the alternatives rank

Amazon Redshift Pricing: An Ultimate Guide

AWS Redshift is a completely managed cloud data warehouse service with the ability to scale on-demand. However, the pricing is not simple, since it tries to accommodate different use cases and customers.

AWS Redshift Query Limits

At its heart, Redshift is an Amazon petabyte-scale data warehouse product that is based on PostgreSQL version 8.0.2.