ETL and ELT in Data Warehousing
What is the difference between ETL and ELT?
What is ETL used for?
If you’re looking to understand, ETL and ELT differences let’s start with explaining what they are. ETL, or Extract Transform Load, is when an ETL tool or series of homegrown programs extracts data from a data source(s), often a relational database, and performs transformation functions. Those transformations could be data cleansing, standardizations, enrichment, etc., and then write (load) that data into a new repository, often a data warehouse.
In the ETL process, an ETL tool or series of programs extracts the data from different RDBMS source systems, and then transforms the data, by applying calculations, concatenations, etc., and then loads the data into the Data Warehouse system.
What is ELT used for?
ELT, or Extract Load Transform turns the ETL process around a little bit and has you extract the raw data out from the data source and directly load it into the destination, without any processing in between. The transformation process is then done “in place” in the destination repository. Generally, the raw data is stored indefinitely so various transformations and enrichments can all be done by users with access to it, using tools they are familiar with.
Both are data integration styles and have much in common with their ultimate goals, but are implemented very differently. Knowing what they are, and understanding the ETL and ELT processes, let’s dive deeper into how they differ from one another.
What is the difference between ETL and ELT?
So how does ETL vs ELT break down?
|Definition||Data is extracted from ‘n’ number of data sources. Transformed in a separate process, then loaded into the destination repository.||Data is extracted from ‘n’ number of data sources and directly loaded into the destination repository. Transformation occurs inside the destination.|
|Transformation||Data is transformed within an intermediate processing step that is independent of extract and load.||Data can be transformed on an ad-hoc basis during reads, or in batch and stored in another set of tables.|
|Code-Based Transformations||Primarily executed in the compute-intensive transformation process.||Primarily executed in the database but also done ad-hoc through analysis tools.|
|Data Lake Support||Only in the sense that it can be utilized as storage for the transformation step.||Well oriented for the data lake.|
|Cost||Specialized servers for transformation can add significant costs.||Object stores are very inexpensive, requiring no specialized servers.|
|Maintenance||Additional servers add to the overall maintenance burden.||Fewer systems mean less to maintain.|
|Loading||Data has to be transformed prior to loading.||Data is loaded directly into the destination system.|
|Maturity||ETL tools and methods have been around for decades and are well understood.||Relatively new on the scene, with emerging standards and less experience.|
Let’s take HIPAA as an example of data that would lend itself to ETL rather than ELT. The raw HIPAA data contains a lot of sensitive information about patients that isn’t allowed to be shared, so you would need to go through the transformation process prior to loading it to remove any of that sensitive information. Say your analysts were trying to track cancer treatments for different types of cancer across a geographic region. You would scrub your data down in the transformation process to include treatment dates, location, cancer type, age, gender, etc., but remove any identifying information about the patient.
An ELT approach makes more sense with a data lake where you have lots of structured, semi-structured, and unstructured data. This can also include high-velocity data where you are trying to make decisions in near real-time. Consider an MMORPG where you want to offer incentives to players in a particular region that have performed a particular task. That data is probably coming in through a streaming protocol such as Kafka and analysts are doing transforming jobs on the fly to distill it down to the necessary information to fuel the desired action.
In summary, the difference between ETL and ELT in data warehousing really comes down to how you are going to use the data as illustrated above. They satisfy very different use cases and require thoughtful planning and a good understanding of your environment and goals. If you’re exploring whether to use a data warehouse or a data lake, we have some resources that might be helpful. Check out our white paper on Unlocking the Business Value of the Data Lake which discusses the data lake approach in comparison to the data warehouse.
Ready to Modernize Your Data Stack?
In this free whitepaper you’ll learn what the open data lakehouse is, and how it overcomes challenges of previous solutions. Get the key to unlocking lakehouse analytics.
In this article we’ll look at the contextual requirements of a data warehouse, which are the five components of a data warehouse.
A data warehouse is a data repository that is typically used for analytic systems and Business Intelligence tools. Learn more about it in this article.