ETL process diagram

ETL vs ELT in Data Warehousing

Introduction

ETL, or Extract Transform Load, is when an ETL tool or series of homegrown programs extracts data from a data source(s), often a relational database, and performs transformation functions. Those transformations could be data cleansing, standardizations, enrichment, etc., and then write (load) that data into a new repository, often a data warehouse. 

In the ETL process, an ETL tool or series of programs extracts the data from different RDBMS source systems, and then transforms the data, by applying calculations, concatenations, etc., and then loads the data into the Data Warehouse system.

ETL process diagram

ELT, or Extract Load Transform turns the ETL process around a little bit and has you extract the raw data out from the data source and directly load it into the destination, without any processing in between. The transformation process is then done “in place” in the destination repository. Generally, the raw data is stored indefinitely so various transformations and enrichments can all be done by users with access to it, using tools they are familiar with.

ELT, or Extract Load Transform

Both are data integration styles and have much in common with their ultimate goals, but are implemented very differently.

Comparison

So how does ETL vs ELT break down?

CategoryETLELT
DefinitionData is extracted from ‘n’ number of data sources. Transformed in a separate process, then loaded into the destination repository.Data is extracted from ‘n’ number of data sources and directly loaded into the destination repository. Transformation occurs inside the destination.
TransformationData is transformed within an intermediate processing step that is independent of extract and load.Data can be transformed on an ad-hoc basis during reads, or in batch and stored in another set of tables.
Code-Based TransformationsPrimarily executed in the compute-intensive transformation process.Primarily executed in the database but also done ad-hoc through analysis tools.
Data Lake SupportOnly in the sense that it can be utilized as storage for the transformation step.Well oriented for the data lake.
CostSpecialized servers for transformation can add significant costs.Object stores are very inexpensive, requiring no specialized servers.
MaintenanceAdditional servers add to the overall maintenance burden.Fewer systems mean less to maintain.
LoadingData has to be transformed prior to loading. Data is loaded directly into the destination system.
MaturityETL tools and methods have been around for decades and are well understood.Relatively new on the scene, with emerging standards and less experience.

Use Cases

Let’s take HIPAA as an example of data that would lend itself to ETL rather than ELT. The raw HIPAA data contains a lot of sensitive information about patients that isn’t allowed to be shared, so you would need to go through the transformation process prior to loading it to remove any of that sensitive information. Say your analysts were trying to track cancer treatments for different types of cancer across a geographic region. You would scrub your data down in the transformation process to include treatment dates, location, cancer type, age, gender, etc., but remove any identifying information about the patient.

An ELT approach makes more sense with a data lake where you have lots of structured, semi-structured, and unstructured data. This can also include high-velocity data where you are trying to make decisions in near real-time. Consider an MMORPG where you want to offer incentives to players in a particular region that have performed a particular task. That data is probably coming in through a streaming protocol such as Kafka and analysts are doing transforming jobs on the fly to distill it down to the necessary information to fuel the desired action.

Summary

In summary, the difference between ETL and ELT in data warehousing really comes down to how you are going to use the data as illustrated above. They satisfy very different use cases and require thoughtful planning and a good understanding of your environment and goals. If you’re exploring whether to use a data warehouse or a data lake, we have some resources that might be helpful. Check out our white paper on Unlocking the Business Value of the Data Lake which discusses the data lake approach in comparison to the data warehouse.