In my previous article i have given the idea about ETL definition and examples of ETL. I have given some ETL testing related interview questions as well. In this article i would like to give some basic ETL interview questions with answers in detail. These are most important ETL Interview questions with answers with examples.
Question 1 : What is mean by ETL? Explain with examples.( 100% asked ETL interview questions with answers )
As the name suggested ETL stands for Extract Transform and Load.Just like the name applies ETL tool Extracts data from the source.Transforms the data while in transit and then it loads the data in to Specified database.
Following are two different definitions of ETL :
Definition 1 :
ETL is nothing but Extract,Transform and Loading of the data from multiple heterogeneous data sources to a single or multiple sources.
Definition 2 :
ETL is nothing but Extraction,Transformation and Loading of the data.
Question 2 : What is difference between ELT and ETL?( 100% asked ETL interview questions with answers )
|Definition||ETL stands for Extract, Transform and Load where at first step we require to Extract data then do the changes in the data and then third step is to load that data.||ELT stands for Extract the data as first step, Load the data and then do transformation.|
|Usages||The ETL system is mostly been used in the Datawarehouse where we require transformed data.||The ELT system is mostly used where we require to do some transformation of the data after loading it|
|Flexibility||The ETL system is highly flexible system||The ELT system is low flexible than ETL system|
|Working methodology||Data from the source system to the data warehouse||Leverages the target system to transform data|
Question 3 : What is difference between ETL and Business intelligence?( 100% asked ETL interview questions with answers )
|Definition||ETL stands for Extract,Transform and Load which is used for data-warehousing system. ETL is nothing but Extract,Transform and Loading of the data from multiple heterogeneous data sources to a single or multiple sources.||The business intelligence is used for reporting the specified data of any business which is very important and using which the higher management of the business will take the decisions for growth of business.|
|Usages||The ETL system is mostly been used in the Data-warehouse where we require transformed data.||The data coming from ETL system is used to create the BI reports to check for organizational growth.|
|Flexibility||The ETL system is highly flexible system||The BI system is reporting system which is used to check business growth.It may or mayn’t flexible.|
|Working methodology||Data from the source system to the data warehouse||Using the datawarehouse data to create interactive reports|
|TOOLS||The ETL tools are used to extract the data from different data sources, transform the data, and load it into a data warehouse system.||BI tools are used to generate interactive and ad-hoc reports for end-users, data visualization for monthly, quarterly, and annual board meetings.|
Question 4 : What are different types of ETL tools?
There are follwing important ETL tools :
1.Informatica by Power Center
2.Datastage by IBM
3.SAP – Business Objects data Services (BODS)
4.Oracle Warehouse Builder
5.SAS by Data Integration Studio
6.Open source ETL named Clover ETL.
Question 5 : What are different checks needs to complete after ETL process?
Data Transformation checks : We require to check the data is transformed as per the requirements or not.
Data counts check : We require to check that projected data is been loaded properly without any data loss.
Invalid data : Make sure that no invalid data is loaded at target.
Scalability : Make sure that the performance and scalability of data loading is improving properly.
Question 6 : Explain about different layers in ETL systems?( 100% asked ETL interview questions with answers )
- Staging Layer: The most important layer for ETL system is staging layer where we can grab and store the data coming from multiple heterogeneous sources before processing it.
- Data Integration Layer: The data integration layer is responsible to transform the data from the staging layer and need to move that data to the warehouse database. In data integration layer the data will be organized properly in multiple hierarchial groups,facts,dimensions e.t.c. The Schema will be created by combining multiple facts and dimensions.
- Access Layer: Access layer is used by the end-users to retrieve the data for analytical reporting.
Question 7 : Explain the what exactly the data Extraction Process means in ETL system?
The following is the explaination about the data extraction process pointwise :
1.The initial process is to extract the data from multiple heterogeneous sources from source systems.
2.After extracting the data you require to set that data to staging area, Most of datawarehouse systems use the data from multiple sources and we need to combine that data in staging point.
3. Each source system as the seperate format of data may be in relational DBMS or any other XML or JSON formats and we require to include non relational systems in extraction process like Information Management System (IMS) or other data structures such as Virtual Storage Access Method (VSAM) or Indexed Sequential Access Method (ISAM).
4.The extraction process is nothing but the process where we can convert the data in single format which is appropriate for transformation process.
Question 8 : Explain the Extraction,Transformation and loading in single statement.
Take data from an external source and move it to the warehouse pre-processor database.
Transform data task allows point-to-point generating, modifying and transforming data.
Load data task adds records to a database table in a warehouse.
Question 9 : What are different types of fact tables?( 100% asked ETL interview questions with answers )
There are following 3 types of fact tables :
Measures that can be added across any dimension
Measures that can not be added across any dimension
Measures that can be added across some dimensions.
Question 10 : What are different usages of ETL process?( 80% asked ETL interview questions with answers )
- Before ETL tool user needs to write a long code for data transformation to data loading
- ETL makes the life simple and one tool will manage all the scenarios of transformation and loading of the data
- There are following examples where we are using the ETL :
Example 1 : Data warehousing :
The ETL is used in data warehousing concepts. User needs to fetch the data from multiple heterogeneous systems and loads it in data warehouse database.ETL Concept is mainly used here to extract the data from source,transform the data and load it in to target systems.
Example 2: Data Migrations
The data migrations are difficult efforts if you use the PL SQL or T-SQL development to do. If you want to migrate the data using simple way use different ETL tools.
Example 3 : Mergers and Aquisitions
Now a days lot of companies are merging in to different MNCs. To move the data from one company to another company the ETL concepts is been used.
Question 11 : What are different types of data model used in ETL process?( 50% asked ETL interview questions with answers )
There are two types of data model:
1.Logical Data Model
2.Physical data model
Logical Data Model:
1.The logical data model is nothing but the representation of your database in logical way.
2.The logical data model is actually a representation of business requirements in logical way.
3.The logical data model is the actual implementation and extension of a conceptual data model.
4.There are following different things used for creating logical data model:
Entities, Attributes, Super Types, Sub Types, Primary Key, Alternate Key, Inversion Key Entry, Rule, Relationship, Definition, business rule, etc
Physical Data Model :
1.The Physical data model is nothing but the representation of physical database.
2.Physical data model includes all required tables, columns, relationship, database properties for the physical implementation of databases.
3.Database performance, indexing strategy, and physical storage are important parameters of a physical model.
4.The important or main object in a database is a table which consists or rows and columns. The approach by which physical data models are created is called as physical data modeling.
Question 12 : Explain how ETL is used in data-warehousing?
Near Real Time Report Development : When there is need to fetch the data from multiple sources and use it for reporting purpose we require to use ETL concept. Now a days near real time reports are coming in to limetight. So we are using the ETL to create Near real time reports.
Data-warehouses : The Data warehouse data is nothing but combination of historical data as well as transactional data. Its data sources might be different.User needs to fetch the data from multiple heterogeneous systems and load it in to single target system which is also called as data warehouse.
Example for Warehousing :
As The ETL definition suggests that ETL is nothing but Extract,Transform and loading of the data;This process needs to be used in data warehousing widely. The simple example of this is managing sales data in shopping mall. If user wants the historical data as well as current data in the shopping mall first step is always user needs to follow the ETL process.Then that data will be used for reporting purpose.
Question 13 : What is mean by data mart? What is use of data-mart in ETL?
Data Mart is a simple form of Data Warehouse, and it is focused on a single functional area. It gets the only from few sources.
Use of data-mart in ETL : We require to transform and load data for single functional area from multiple sources at that time ETL is been used. As an example let us consider the example of customer management department and we require to fetch data from oracle as well as mysql databases. We need to use ETL process to extract data from source and load it in the target system.
Question 14 : What are different characteristics of datawarehouse?
- Data warehouse is a database which is separate from operational database which stores historical information also.
- Data warehouse database contains transactional as well as analytical data.
- Data warehouse helps higher management to take stratagic as well as tactical decisions using historical or current data.
- Data warehouse helps consolidated historical data analysis.
- Data warehouse helps business user to see the current trends to run the business.
- Data warehouse is used for reporting and data analysis purpose.
Question 15 : What are cubes? Explain with example.( 60% asked ETL interview questions with answers )
Definition of Cubes : For multidimensional analysis of data-warehouses we require to create data processing units comprised of facts and dimensions. Those data processing units are called as Cubes.
Example of Cube OLAP Cube :
OLAP stands for Online Analytics Processing, and OLAP cube stores large data in muti-dimensional form for reporting purposes. It consists of facts called as measures categorized by dimensions.
Question 16 : What is mean by staging area? Explain importance of staging area.
The Staging area is optional intermediate data processing area where we can extract the data from multiple heterogeneous sources. Following are multiple usages of staging the data from multiple sources :
Faster Data Processing : The staging area is used to process data faster than directly processing the data. The staging area data is already there and the we can use that data for faster processing of data in transformation process.
Auditing of data : The staging data is useful data for auditing purpose. We can use audit tables in staging area which will make sure that the data processing is done properly in data-warehouses database.It will write the history of table.
Recovery of data : In case of any kind of failure the staging data is always available as backup of data.We can transform that data.The staging data is always used as recovery checkpoints.
Loading performance : It will improve the performace of tranformation and faster loading of the data.
Question 17 : What is mean by dimension table? Explain with example .
Dimension table is table which describes the business entities of an enterprise which describes the objects in a fact table.Dimension table has primary key which uniquelly identifies each dimension row.Dimension table is sometimes called as lookup or reference table.The primary key of dimension table is used to associate relationship between fact table which contains foreign key.Dimension tables are normally in de-normalized form because these tables are only used to analyse the data and not used to execute transactions.
The fields in a dimension table is used to complete following 3 important requirement :
- Query Constrainting
- Grouping /Filtering
- Report labeling
Real Life Example :
Consider following table which contains item information.In the following table ITEM KEY is primary key which uniquelly identifies the rows in the dimension table. ITEM KEY will be present in Fact table.
|ITEM KEY||ITEM NAME||BRAND||SOLD BY||Category|
In the image i have explained which are fact and which are dimension tables. You will able to see there are four dimesions :
Question 18 : What is mean by data transformation? Explain with example
1.Data tranformation is second phase of ETL process where data needs to be cleansed properly before it will get loaded in to the system.
2.In data transformation phase we require to apply series of rules and regulations to process the data for transformation purpose.
3.The important function of data transformation is cleansing of the data and add the data in target according to the business requirements.
4.Example of Data Transformation : If we require to use first name and last name together in reporting or data-warehousing database we require to use the data transformation technique to merge First name + Last name.
Question 19 : Explain multiple examples of transformation in ETL process.
Following are some important examples of tranformation :
- Select Certain columns for load: Sometimes there is a requirement where we require to select only certain columns to load.As example if we want to load the number of employees who are working in company till date. We require to remove employees whose salary is null. This is called as transformation.
- Translating coded values: Sometimes there are requirements where we require to transform the coded values. Example if for male we are using 1 in source and 2 for Female then we require to transform it to M for male and F for female.
- Encode the values : We require to encode the values during the transformation (Example: For “Female” us “F”)
- Calculations: This is another important type of transformation where we require to keep the calculated values. (Example : Percentage profit = profit/sale * 100)
- Performance Improvement using Sorting Techniques : We require to use the Sorting or ordering the data based on a list of columns to improve search performance
Question 20 : How do you maintain logs mechanism in ETL?
Logging is extremely important to keep track of all changes and failures during a load. The most common ways to prepare for logging are to use flat files or a logging table. That is, during the process, counts, timestamps, and metadata about the source and target are added and then dumped into a flat file or table.
This way the load can be checked for invalid runs. When such a table or file exists, the next step would be to prepare notifications. This could be a report, or a simple formatted email, describing the load as soon as it finishes (e.g. the number of processed records compared to the previous load.)
Question 21 :How ETL is used in Data migration projects. Explain with example.(60% asked ETL Testing Interview Questions)
ETL tools are widely used in data migration projects. If the organization is managing the data in oracle 10 g previously and now organization wants to go for SQL server cloud database then there is need to migrate the data from Source to Target.To do this kind of migration the ETL tools are very useful. If user wants to write the code of ETL it is very time consuming process. To make this simple the ETL tools are very useful in which the coding is simple as compare to PL SQL or T-SQL code.So ETL process i very useful in Data migration projects.
Question 22 : Explain multiple steps to choose ETL tool .(90% asked ETL Testing Interview Questions)
The choosing the ETL tool is very difficult thing. You need to consider lot of factors while choosing the correct ETL tool according to the project.Choosing the ETL tool for specific project is very stratagic move even you need it for a small project.Make sure that ETL tool migrations are no small efforts.In this section i would like to give you some bullet points to consider while choosing your ETL tool.
1.Data Connectivity :
ETL tool should be communicate with any source of data no matter where it comes from.This is very critical.
Moving and changing a data requires some serious processing power. So you need to check the performance factors.
3.Transformation Flexibility :
Matching,Merging and changing the data is very critical.ETL data should provide these and many transformation packages which allow modifications to the data in transformation phase with simple drag and drop.
4.Data Quality :
Your data is not clean. The only way to leverage your data when your data is consistent and clean.
5.Flexible data Acquisition options :
Once the ETL is ready you need to check that ETL will work on previous data as well as new coming data.
6.Commited ETL Vendor :
You are playing with the organization data while doing the ETL process.So Choose vendor who is very well known in the industry and whose support is really great.
Question 23 :Explain Star Schema in ETL process with Example.
In Star schema there is Fact table as a center and all dimension tables surrounded with that fact table.It is called as Star schema because diagram resembles a star with points radiating from center.Star schema is used in simple data mart or data warehouse.Star schema is designed in such way that it will optimize the querying on large data sets.In Star schema multiple dimension tables joined with only one fact table in de-normalized form.
Real Life Example :
In above diagram you will able to see that table named transaction is fact table and all 5 are Dimension tables.
Question 24 : What is mean by Granularity?
Granularity in table represents the level of information stored in the table.In BI granularity is very important concept to check the table data.The granularity is high and low .High granularity data contains the data with high information or you can say it as transaction level data is high granularity data.Low granularity means data has low level information only.Fact table always have low granularity mean we need very low level data in fact table.
Following 2 points are important in defining granularity :
1.Determining the dimensions that are to be included
2.Determining location to place hierarchy of each dimension of information.
Real life Example :
Date Dimension Granularity level :
Question 25 : Explain the characteristics of Data warehouse?
- Data-warehouse database is used to store the operational data as well as historical data.
- Data-warehouse not only contains the transactional data but also contains the analytical data.
- The Data-warehouse databases are used for analyzing the data and for reporting purpose.
- Using the Data-warehouse the higher management will take the strategic as well as tactical decisions.
- Data Warehouse helps the business user to the current trend to run the business.
These are some most important ETL interview questions with answers in detail. Hope you like this article. Kindly comment in comments section if you like this article or if you have any concerns.