ETL Concepts | Extract Transform Load Concepts with Examples

ETL Concepts :

In my previous article i have given idea about the ETL definition with its real life examples. In this article i would like to explain the ETL concept in depth so that user will get idea about different ETL Concepts with its usages.I will explain all the ETL concepts with real world industry examples.What exactly the ETL means. ETL stands for Extract Transform Load. ETL is nothing but Extract,Transform and Loading of the data from multiple heterogeneous data sources to a single or multiple sources.

ETL Definition

How to choose ETL Tool?

In this section i would like to explain about the ETL Tools :

  • Data connectivity : ETL tool should communicate with any source of data.
  • Performance: Moving and Changing data is not a simple activity. By considering the data volume user needs to check the performance of ETL tool
  • Transformation Flexibility : Lot of complex transformation should be made with simple drag and drop in ETL Tools
  • Data Quality : Check whether data is consistent and clean. Check for the tool which will provide facility of Data Quality.

Some Important terms in ETL Concepts :

In this section i would like to give you some Important terms before i explain detail about ETL.

Source System :

The source system is any system from which we are extracting the data.

Example : It should be any database,Application,File with data.

Target System : 

Target System is any system where we are loading the data.

Example : It should be any database,Application,File with data.

Data Mapping :

The Data mapping is nothing but the relationships between database objects.

Staging Area :

The Staging area is nothing but the database area where all processing of the data will be done.

Metadata :

Metadata is data within a data.

These are some important terms to learn ETL Concepts.

ETL Concepts in detail :

In this section i would like to give you the ETL Concepts with detailed description.

Step 1 : Data Extraction :

The data extraction is first step of ETL. There are 2 Types of Data Extraction

1.Full Extraction :  All the data from source systems or operational systems gets extracted to staging area. (Initial Load)

2.Partial Extraction : Sometimes we get notification from the source system to update specific date. It is called as Delta load.

Source System Performance : The Extraction strategies should not affect source system performance.

Step 2 : Data Transformation :

The data transformation is second step.After extracting the data there is big need to do the transformation as per the target system.I would like to give you some bullet points of Data Transformation.

  • Data Extracted from source system is in to Raw format. We need to transform it before loading in to target server.
  • Data has to be cleaned, mapped and transformed
  • There are following important steps of Data Transformation :

1.Selection : Select data to load in target

2.Matching : Match the data with target system

3.Data Transforming : We need to change data as per target table structures

Real life examples of Data Transformation :

  • Standardizing data : Data is fetched from multiple sources so it needs to be standardized as per the target system.
  • Character set conversion : Need to transform the character sets as per the target systems. (Firstname and last name example)
  • Calculated and derived values: In source system there is first val and second val and in target we need the calculation of first val and second val.
  • Data Conversion in different formats : If in source system date in in DDMMYY format and in target the date is in DDMONYYYY format then this transformation needs to be done at transformation phase.

Step 3 : Data Loading

  • Data loading phase loads the prepared data from staging tables to main tables.

There are following 3 Types of Data Loading Strategies :

  • Initial load : Populating all the data tables from source system and loads it in to data warehouse table.
  • Incremental Load : Applying the ongoing changes as necessary in periodic manner.
  • Full Refresh : Completely erases the data from one or more tables and reload the fresh data.

These are some most important ETL concepts.I hope you will get better idea about the ETL. I will explain each and every step of ETL concepts in next articles for sure.If you like this article or if you have any suggestions with the same kindly comment in comment section.

 

Leave a Reply