5 Components of Data Warehouse Architecture
Data Warehouse Architecture
Tip: If you are struggling to get value from your data warehouse due to vendor lock-in or handling semi and unstructured data, set up a time to chat with an engineer about migrating to a Data Lakehouse.
What are the components of a data warehouse?
Most data warehouses will be built around a relational database system, either on-premise or in the cloud, where data is both stored and processed. Other components would include metadata management and an API connectivity layer allowing the warehouse to pull data from organizational sources and provide access to analytics and visualization tools.
A typical data warehouse has four main components: a central database, ETL (extract, transform, load) tools, metadata, and access tools. All of these components are engineered for speed so that you can get results quickly and analyze data on the fly.
The data warehouse has been around for decades. Born in the 1980s, it addressed the need for optimized analytics on data. As companies’ business applications began to grow and generate/store more data, they needed a system that could both manage the data and analyze it. At a high level, database admins could pull data from their operational systems and add a schema to it via transformation before loading it into their data warehouse (this process is also known as ETL – Extract, Transform, Load).
To learn more about the internal architecture of a data warehouse and its various components such as nodes and clusters, check out our previous article on Redshift data warehouse architecture.
Schema is made up of metadata (data about the data) so users could easily find what they were looking for. The data warehouse could also connect to many different data sources, so it became an easier way to manage all of a company’s data for analysis.
As data warehouse architecture evolved and grew in popularity, more people within a company started using it to access data – and the data warehouse made it easy to do so with structured data. This is where metadata became important. Reporting and dashboarding became a key use case, and SQL (structured query language) became the de facto way of interacting with that data.
Here’s a quick high level overview of the data warehouse architecture:
- SQL Query Processing
- Data layer
As mentioned above, ETL stands for Extract, Transform, Load. When DBAs want to move data from a data source into their data warehouse, this is the process they use. In short, ETL converts data into a usable format so that once it’s in the data warehouse, it can be analyzed/queried/etc. For the purposes of this article, I won’t go into too much detail of how the entire ETL process works, but there are many different resources where you can learn about ETL.
Metadata is data about data. Basically, it describes all of the data that’s stored in a system to make it searchable. Some examples of metadata include authors, dates, or locations of an article, create date of a file, the size of a file, etc. Think of it like the titles of a column in a spreadsheet. Metadata allows you to organize your data to make it usable, so you can analyze it to create dashboards and reports.
SQL Query Processing
SQL is the de facto standard language for querying your data. This is the language that analysts use to pull out insights from their data stored in the data warehouse. Typically data warehouses have proprietary SQL query processing technologies tightly coupled with the compute. This allows for very high performance when it comes to your analytics. One thing to note, however, is that the cost of a data warehouse can start getting expensive the more data and SQL compute resources you have.
The data layer is the access layer that allows users to actually get to the data. This is typically where you’d find a data mart. This layer partitions segments of your data out depending on who you want to give access to, so you can get very granular across your organization. For instance, you may not want to give your Sales team access to your HR team’s data, and vice versa.
This is related to the data layer in that you need to be able to provide fine grained access and security policies across all of your organization’s data. Typically data warehouses have very good governance and security capabilities built in, so you don’t need to do a lot of custom engineering work to include this. It’s important to plan for governance and security as you add more data to your warehouse and as your company grows.
+ Data Warehouse Access Tools
While access tools are external to your data warehouse, they can be seen as its business-user friendly front end. This is where you’d find your reporting and visualization tools, used by data analysts and business users to interact with the data, extract insights, and create visualizations that the rest of the business can consume. Examples of these tools include Tableau, Looker, and Qlik.
Challenges with a Data Warehouse Architecture
Now that I’ve laid out the five key components of a data warehouse architecture, let’s discuss some of the challenges of the data warehouse. As companies start housing more data and needing more advanced analytics and a wide range of data, the data warehouse starts to become expensive and not so flexible. If you want to analyze unstructured or semi-structured data, the data warehouse won’t work.
We’re seeing more companies moving to the Data Lakehouse architecture, which helps to address the above. The Open Data Lakehouse allows you to run warehouse workloads on all kinds of data in an open and flexible architecture. Instead of a tightly coupled system, the Data Lakehouse is much more flexible and also can manage unstructured and semi-structured data like photos, videos, IoT data, and more. Here’s what that architecture looks like:
The Data Lakehouse can also support your data science, ML and AI workloads in addition to your reporting and dashboarding workloads. If you are looking to upgrade from data warehouse architecture, then developing an Open Data Lakehouse is the way to go.
If you’re interested in learning more about why companies are moving from the data warehouse to the data lakehouse, check out this free whitepaper on how to Unlock the Business Value of the Data Lake/Data Lakehouse, or read about the differences between a data lakehouse, a data mesh, and a data warehouse.
Data Warehouse: A Comprehensive Guide
Data warehouse architecture is a data repository, typically used for analytic systems and Business Intelligence tools. Take a look at this article to get a better understand of what it is and how it’s used.
Data Warehouse Concepts for Beginners
A relational database that is designed for query and analysis rather than for transaction processing. Learn more here.