Data Modeling Interview Questions :
In my previous article i have explained about the data modeling techniques with its real life examples.In this article i would like to give you idea about Data Modeling Interview Questions. I have written so many articles which gives you the idea about different SQL interview questions which are important for the users. I need to explain the users about Data Modeling Interview Questions with answers in this article.Now a days data modeling becomes the backbone of any new technology like Business Intelligence.In this article i will give some most important Data Modeling Interview Questions with its answers so that its easy for user to face the interview.
Question 1 : What is Data Modeling? ( 100% asked Data Modeling Interview Questions )
Answer :
1.Data Modeling is nothing but the database design technique which is useful to fetch different types of complex sql queries in DWH environment.
2.The Data Modeling technique is nothing but representation of Logical data model and physical data model according to the Business requirements.
3.Data Modeling is most important Design technique which used to support the users in data warehousing.
4.Data Modeling always uses two types of tables you can say it as facts and dimensions tables.
5.Facts tables are the tables which contains numerical value which can be aggregated and analyzed on fact values.Dimension defines hierarchies and description of fact values.
Question 2 : Explain Fact tables with example.( 100% asked Data Modeling Interview Questions )
Answer :
Fact table is central table found in star schema or snowflakes schema which is surrounded by dimension tables.Fact table contains numeric values that are known as measurements.Fact table has two types of columns:
1.Facts
2.Foreign key of dimension tables.
Real Example :
Following is a fact table which contains all the primary keys of dimensions table and added measures for ITEM,i.e.Product sold.
ITEM KEY | Time key | Product key | Date key | Product Sold |
00001 | T001 | P001 | D001 | 100 |
00002 | T002 | P002 | D002 | 30 |
00003 | T003 | P003 | D003 | 15 |
The fact table contains the foreign keys,time dimensions,product dimension,customer dimension,measurement values.Following are some examples of common facts :
No of unit sold,Margin,Sales revenue and the dimension tables are customer,time and product e.t.c. which is used to analyse data.
Question 3 : What are different types of fact table?( 100% asked Data Modeling Interview Questions )
Answer :
There are following 3 types of fact tables :
1.Additive :
Measures that can be added across any dimension
2.Non-additive:
Measures that can not be added across any dimension
3.Semi-additive:
Measures that can be added across some dimensions.
Question 4 : What are different types of data model?( 100% asked Data Modeling Interview Questions )
There are two types of data model:
1.Logical Data Model
2.Physical data model
Logical Data Model:
1.The logical data model is nothing but the representation of your database in logical way.
2.The logical data model is actually a representation of business requirements in logical way.
3.The logical data model is the actual implementation and extension of a conceptual data model.
4.There are following different things used for creating logical data model:
Entities, Attributes, Super Types, Sub Types, Primary Key, Alternate Key, Inversion Key Entry, Rule, Relationship, Definition, business rule, etc
Physical Data Model :
1.The Physical data model is nothing but the representation of physical database.
2.Physical data model includes all required tables, columns, relationship, database properties for the physical implementation of databases.
3.Database performance, indexing strategy, and physical storage are important parameters of a physical model.
4.The important or main object in a database is a table which consists or rows and columns. The approach by which physical data models are created is called as physical data modeling.
Question 5 : What is database normalization?( 100% asked Data Modeling Interview Questions )
Answer :
Database Normalization is nothing but technique of designing the database in structured way to reduce redundancy and improve data integrity.
Database Normalization is used for following Purpose:
- To Eliminate the redundant or useless data
- To Reduce the complexity of the data
- To Ensure the relationship between tables as well as data in the tables
- To Ensure data dependencies and data is logically stored.
Question 6 : What are different check points for normalizing data?
Answer :
There are following checkpoints to normalize the data :
1.Arrangement of data into logical groups.
2.Minimize the Duplicate data.
3.Organize the data in such way that when modification needed then there should be only one place modification required.
4.User can access and manipulate data quickly and efficiently.
Question 7 : What is first normal form ? Explain with example.( 80% asked Data Modeling Interview Questions )
Answer :
The first normal form is the normal form of database where data must not contain repeating groups.The database is in First normal form If,
1.It contains only automic values.
Automic values:- The Single cell have only single value
2.Each Record needs to be unique and there are no repeating groups.
Repeating Groups:- Repeating group means a table contains 2 or more values of columns that are closely related.
Example:
Consider following table which is not normalized:
Employee Table:
Employee No | Employee Name | Department |
1 | Amit | OBIEE,ETL |
2 | Divya | COGNOS |
3 | Rama | Administrator |
To bring it in to first normal form We need to split table into 2 tables.
First table:Employee Table
Employee No | Employee Name |
1 | Amit |
2 | Divya |
3 | Rama |
Second Table: Department table
Employee No | Department |
1 | OBIEE |
1 | ETL |
2 | COGNOS |
3 | Administrator |
We have divided the table into two different tables and the column of each table is holding the automic values and duplicates also removed.
Question 8 : Explain Second Normal Form with example.( 80% asked Data Modeling Interview Questions )
Answer :
The data is said to be in second normalized form If,
1.It is in First normal form
2.There should not be any partial dependency of any column on primary key.Means the table have concatanated primary key and each attribute in table depends on that concatanated primary key.
3.All Non-key attributes are fully functionally dependent on primary key.If primary is is not composite key then all non key attributes are fully functionally dependent on primary key.
Example:
Let us consider following table which is in first normal form:
Employee No | Department No | Employee Name | Department |
1 | 101 | Amit | OBIEE |
2 | 102 | Divya | COGNOS |
3 | 101 | Rama | OBIEE |
In above example we can see that department .Here We will see that there is composit key as{ Employee No,Department No}.Employee No is dependent on Employee Name and Department is dependent on Department No.We can split the above table into 2 different tables:
Table 1:Employee_NO table
Employee No | Department No | Employee Name |
1 | 101 | Amit |
2 | 102 | Divya |
3 | 101 | Rama |
Table 2:Department table
Department No | Department |
101 | OBIEE |
102 | COGNOS |
Now we have simplified the table in to second normal form where each entity of table is functionally dependent on primary key.
Question 9 : Explain Third Normal Form with example.( 80% asked Data Modeling Interview Questions )
Answer :
The database is in Third normal form if it satisfies following conditions:
1.It is in Second normal form
2.There is no transitive functional dependency
Transitive Dependency:
When table 1 is Functionally dependent on table 2. and table 2 is functionally dependent on table 3 then.table 3 is transitively dependent on table 1 via table 2.
Example:
Consider following table:
Employee No | Salary Slip No | Employee Name | Salary |
1 | 0001 | Amit | 50000 |
2 | 0002 | Divya | 40000 |
3 | 0003 | Rama | 57000 |
In above table Employee No determines the Salary Slip No.And Salary Slip no Determines Employee name.Therefore Employee No determines Employee Name.We have transitive functional dependency so that this structure not satisfying Third Normal Form.
For That we will Split tables into following 2 tables:
Employee table:
Employee No | Salary Slip No | Employee Name |
1 | 0001 | Amit |
2 | 0002 | Divya |
3 | 0003 | Rama |
Salary Table:
Salary Slip No | Salary |
0001 | 50000 |
0002 | 40000 |
0003 | 57000 |
Following are 2 Advantages of 3rd normal form:
1.Amount of data duplication is removed because transitive dependency is removed in third normal form.
2.Achieved Data integrity.
Question 10 : Explain Boyce Code Normal Form with example.( 50% asked Data Modeling Interview Questions )
Answer :
BCNF Normal form is higher version of third normal form.This form is used to handle analomies which are not handled in third normal form.BCNF does not allow dependencies between attributes that belongs to candidate keys.It drops restriction of the non key attributes from third normal form.
Third normal form and BCNF are not same if following conditions are true:
1.The table has 2 or more candidate keys
2.At least two of candidate keys are composed of more than 1 attribute
3.The keys are not disjoint.
Example:
Address-> {City,Street,Zip}
Key 1-> {City,Zip}
Key 2->{City,Street}
No non key attribute hence this example is of 3 NF.
{City,Street}->{zip}
{Zip}->{City}
There is dependency between attributes belonging to key.Hence this is BCNF.
Question 11 : What is Dimension table? Explain with example.( 100% asked Data Modeling Interview Questions )
Answer :
Dimension table is table which describes the business entities of an enterprise which describes the objects in a fact table.Dimension table has primary key which uniquelly identifies each dimension row.Dimension table is sometimes called as lookup or reference table.The primary key of dimension table is used to associate relationship between fact table which contains foreign key.Dimension tables are normally in de-normalized form because these tables are only used to analyse the data and not used to execute transactions.
The fields in a dimension table is used to complete following 3 important requirement :
- Query Constrainting
- Grouping /Filtering
- Report labeling
Real Life Example :
Consider following table which contains item information.In the following table ITEM KEY is primary key which uniquelly identifies the rows in the dimension table. ITEM KEY will be present in Fact table.
ITEM KEY | ITEM NAME | BRAND | SOLD BY | Category |
00001 | Yellow shirt | Yazaki | Amit | Shirts |
00002 | Football | Start sports | Rahul | Sports |
00003 | Blue Shorts | Puma | Amit | Shorts |
In the image i have explained which are fact and which are dimension tables. You will able to see there are four dimesions :
1.Time
2.Location
3.Item
4.Branch
Question 12 : Explain What is Aggregate table with Example.( 100% asked Data Modeling Interview Questions )
Answer :
Aggregate table contains aggregated data which can be calculated by using different aggregated functions like count,avg,min,max.e.t.c.Aggregated tables are most widely used tables in OLAP database.Aggregate functions are functions where the values of table or column are grouped together and form a single value. Following are some aggregate functions:
1.Average
2.Count
3.MAX
4.MIN
5.Median
6.SUM
Using the above aggregate functions the data will be inserted in aggregate table.The aggregate tables are used for performance optimization and data is coming fast using aggregate table.Aggregations applied on database level improves the performance of the query as the query not hits directly on table it will hit on aggregate table and fetches data.
Real Example:
If table contains the data of year 2016 and 2017 ,User wants a actual count of records in the table monthly,Quarterly and yearly.Then We need to make aggregate table which contains count of records monthly,For Quarterly we need to create other table and push the quarterly count in that table.And we need to use that tables in the reports so that report performance will improve drastically.
Question 13 : Explain Primary key with example.
Answer :
The SQL PRIMARY KEY Constraint Uniquely identifies each record in a database table.
PRIMARY KEY must contain unique values. A Primary Key column cannot contain NULL values. Each table can have only ONE PRIMARY KEY.
Example :
CREATE TABLE Student
(
RollNo Number (10),
FName Varchar2 (15),
LName Varchar2 (15),
Location Varchar2 (10),
CONSTRAINT PK_Student_FName_LName PRIMARY KEY (FName, LName)
);
Question 14 : Explain SQL Check Constraint.
Answer :
The CHECK constraint is used to limit the value range that can be placed in a column.
If you define a CHECK constraint on a single column it allows only certain values for this column.
If you define a CHECK constraint on a table it can limit the values in certain columns based on values in other columns in the row.
Real Life Scenario:
CREATE TABLE Student
(
Class Number (5),
RollNo Number (10),
Subject Varchar2 (15),
Fees Number (10, 2) CHECK (Fees>500)
);
In above example we have added the Check constraint for Fees column.
Question 15 : What are different type of Schema used in Data Modeling?
Answer :
There are two types of schemas used in Data Modeling:
- Star Schema
- Snowflake Schema
Question 16 : Explain Star Schema with Example.
Answer :
In Star schema there is Fact table as a center and all dimension tables surrounded with that fact table.It is called as Star schema because diagram resembles a star with points radiating from center.Star schema is used in simple data mart or data warehouse.Star schema is designed in such way that it will optimize the querying on large data sets.In Star schema multiple dimension tables joined with only one fact table in de-normalized form.
Real Life Example :
In above diagram you will able to see that table named transaction is fact table and all 5 are Dimension tables.
Question 17 : Explain Granularity.( 100% asked Data Modeling Interview Questions )
Answer :
Granularity in table represents the level of information stored in the table.In BI granularity is very important concept to check the table data.The granularity is high and low .High granularity data contains the data with high information or you can say it as transaction level data is high granularity data.Low granularity means data has low level information only.Fact table always have low granularity mean we need very low level data in fact table.
Following 2 points are important in defining granularity :
1.Determining the dimensions that are to be included
2.Determining location to place hierarchy of each dimension of information.
Real life Example :
Date Dimension Granularity level :
Year,month,quarter,period,week,day.
Question 18 : Explain Snowflake Schema with example.
Answer :
Snowflake schema is a form of dimensional modeling where dimensions are stored with multiple dimension tables.Snowflake schema is variation over star schema.The schema is diagrammed as each fact is surrounded with dimensions;and some dimensions are further related to other dimensions which are branched in snowflake pattern.In snowflake schema multiple dimension tables are organized and joined with fact table.Only difference between star and snowflake schema is dimensions are normalized in snowflake schema.
Real life Example :
In Diagram i shown the snowflake schema where sales table is a fact table and all are dimensions.Store table is further normalized in to different tables name city,state and region.
Question 19 : Explain Difference between OLTP and OLAP?( 100% asked Data Modeling Interview Questions )
Answer:
Following Table shows difference between OLAP and OLTP system:
OLAP | OLTP |
Data Storage:It stores only historical data and historical data processing is done. | Data Storage:It involves daily processing of data |
Users Of System:OLAP System is used by higher management like managers , analysts , executives,CEO,CFOs | Users Of System:OLTP system used by DBAs, Database Professionals ,Programmers for applying business logic. |
Key use:OLAP is used to analyse the business | Key use:OLTP is used to run the business |
Optimization Techniques:OLAP is very huge database so lot of indexes are used for fast data processing | Optimization Techniques:OLTP uses less indexing as data is less here |
Database Schema:OLAP uses Star-Schema,Snowflakes schema or Fact-Dimensions | Database Schema:OLTP uses Entity Relations |
Data Information:OLAP contains historical data | Data Information:OLTP contains Current data |
Join Data:OLAP has less joins and in de-normalized form database | Join Data:OLTP has large no of joins and in normalized form |
Aggregated Data:OLAP system has aggregated multidimensional data | Aggregated Data:OLTP has not aggregated data. |
Summarized data:OLAP system gives Summarized consolidated data | Detailed Data:OLTP system gives data in detailed format |
Data Size:OLAP database size is 100 GB to 100 TB | Data Size:OLTP database size is 100 MB to 100 GB |
Question 20 : Explain Hierarchies in data modeling with its types.
Answer :
Hierarchies is series of many to one relationships which has different levels.
Question 21 : Explain what is mean by Confirmed dimension.
Answer:
A Dimension that is utilized as a part of different areas is called as conformed dimension. It might be utilized with different fact tables in a single database or over numerous data marts/warehouses. The Confirmed dimension is the dimension which is connected to two or more fact tables.
Real Life Example:
If Order table is connected to product fact as well as Order item fact and user needs to fetch data from both tables then Order dimension will be joined to Product as well as Order item table.
Question 22 : What is Slowly Changing Dimensions? Explain with Example.( 100% asked Data Modeling Interview Questions )
Answer :
Slowly Changing dimensions are dimensions which are used to manage both historical data as well as the current data in data-warehousing.
There are following different types of Slowly changing dimensions:
SCD Type 0 :
There are some attributes which will not be changed any point of time is called as Type 0 SCD.
Example :
DateofBirth is the best example of SCD Type 0 which will not change at any point of time.
SCD Type 1: These are the dimensions where previous value of the attribute is replaced by the current value. The Historical data is not maintained in SCD Type 1.
Example :
The History is not maintained in SCD Type 1. If Employee has changed the location then location will be updated to current location.
SCD Type 2: These are dimensions in which the historical value of attribute is preserved in column and new value is updated with insert statement.The new columns needs to be added to achieve this.
Example :
The History is not maintained in SCD Type 1. If Employee has changed the location then location will be updated to current location.Here the location is not updated the new columns like Start date and end date is created and if the Employee has changed the location on specified date it will add that date and for second record End date is blank.
SCD Type 3 :These are the type of dimensions where limited history is preserved. And we use an additional column to maintain the history.
Example:
If Employee has changed the City then new column will be added as new city and in that column the new city is added and old city name is preserved in City column.
Question 23 : Explain what is data mart.
Answer:
Data Mart is a simplest set of Data warehouse which is used to focus on single functional area of the business.We can say Data Mart is a subset of Data warehouse which is oriented to specific line of business or specific functional area of business such as marketing,finance,sales e.t.c. The data come in to Data Mart by different transactional systems,other data warehouse or external sources.
Data Mart is simply a subset of Organization’s Data warehouse.
Question 24 : Explain Data Mart Vs Data Warehouse?
Answer:
Data warehouse | Data mart |
Data warehousing is subject oriented, time variant, non-volatile collection of data which is used for creation of transactional reports as well as historical reports. | Data Mart is simply a subset of Organization’s Data warehouse |
Definition:The Data Warehouse is a large repository of data collected from different organizations or departments within a corporation.
|
Definition:
The data mart is an only subtype of a Data Warehouse. It is designed to meet the need of a certain user group. |
Focus:
Data warehouse focuses on multiple business areas.
|
Focus:
Data mart focuses only on single subject area. |
Usage:It helps to take a strategic decision.
|
Usage:
The data mart is used to take tactical decisions for growth of business. |
Type of system :
This is centralized system where one fact is at center surrounded by dimension tables. |
Type of system :
Data mart system is de centralized system |
Scope of Business:The Data warehouse will cover all business areas. | Scope of Business:The scope of Data mart is within the line of the Business. |
Data Model:
Data warehouse always follows top-down model |
Data Model:
Data mart always follows bottom-up model. |
Data Size:
Data warehouse contains all historical data so the database size is large.Approximate size of data warehouse is greater than 100 GB. |
Data Size:
Data mart contains data with only one business area so the size of database is smaller than data warehouse. |
Source:
Data warehouse data comes from multiple heterogeneous data sources. |
Source:
Data mart data is data of only one business area.Many times it will come from only one data source. |
Implementation Time:
Data warehouse contains all data which will come from multiple data sources. It will take time to build data warehouse. The Time to build data warehouse is months to years. |
Implementation Time:
Data mart is small data warehouse which will contain the data of only a single business area. The implementation time to build data mart is in months. |
Question 25 : Explain Common Errors you faced while doing data modeling?
Answer:
There are following common errors user will face while doing data modeling:
- Building Large Data Models: Building large data models is not a good practice. If tables are more than 200 the data model become more and more complex. In that case that data model will fail.
- Unnecessary Suggogate keys: Surrogate key should not be used unnecessarily. Use surrogate key only when the natural key cannot serve the purpose of a primary key.
- Purpose Missing: There are so many situations where user does not know the purpose of the business. If user does not have proper understanding of the specified business there is no way to create specified data model for that business. So there is need to know the purpose of the business before creating data model.
- Inappropriate Denormalization : Don’t denormalize until and unless you have a solid & clear business reason to do so because de-normalization creates redundant data which is difficult to maintain.
These are some most important Data Modeling interview Questions with Answers. If You like this article or if you have any suggestions with the same kindly comment it in comment section.
Thanks for this article..it is a clear explanation.
Thanks veera for your lovely comments 🙂
Good overview of data modelling questions. Thanks.
Thank u very much Ashish for lovely comments..
nice work man..
Thanks arun for good words !!!
Nice compilation of questions.
Thanks Shubhabrata for nice words!!
Thanks for thing article i am learning a lot.
Thanks Andronicca for Kind words!!
Thanks!I learn the basic data modeling concept.
Thanks for your good words!
Nice. I inferred more information while reading this article.
Thanks Senthil for your good words 🙂
Really nice article, very well articulated. Thanks alot Amit
Thank you so much Raj for your good words!