Data Mining Interview Questions

What are Latest Data Mining Interview Questions ?

Data Mining Interview Questions :

In my previous article i have given the idea about data mining with examples. This article will give you the Data Mining Interview Questions with Answers.Data mining is the process of sorting through large data sets to identify patterns and establish relationships to solve problems through data analysis.Checking the pattern and fetch the data which you needed is important.This article gives you the basic as well as advanced Data Mining Interview Questions with its answers. These Data Mining Interview Questions are most asked interview questions in interview.

Following are different type of Data Mining Interview Questions with Answers :

1.What is Data Mining? (100 % asked Data Mining Interview Questions )

Answer :

Data Mining is the process used for the extraction of hidden predictive data from huge databases. Everyone must be aware of data mining these days is an innovation also known as knowledge discovery process used for analyzing the different perspectives of data and encapsulate into proficient information.

Data Mining is process of discovering the patterns in very large data sets involving the different methods like Machine Learning,statistics,different database systems.

2.Define Data Mining. (100 % asked Data Mining Interview Questions )

Answer :

There are following different definitions of data mining :

Definition 1 :

Data Mining is the process used for the extraction of hidden predictive data from huge databases.

Definition 2 :

Data Mining is process of discovering the patterns in very large data sets involving the different methods like Machine Learning,statistics,different database systems.

Definition 3:

Data mining is defined as a process used to extract usable data from a larger set of any raw data which implies analyzing data patterns in large batches of data using one or more software.

Definition 4 :

The automated extraction of hidden data from a large amount of database is Data Mining.

Definition 5 :

Data mining refers to the process of extracting the valid and previously unknown information from a large database to make crucial business decisions.

3.What is basic difference between data mining and data warehousing?

Answer :

Data Warehousing :

Data warehousing is merely extracting data from different sources, cleaning the data and storing it in the warehouse.

Data Mining :

Where as data mining aims to examine or explore the data using queries. These queries can be fired on the data warehouse. Explore the data in data mining helps in reporting, planning strategies, finding meaningful patterns etc.

Example :

A data warehouse of a company stores all the relevant information of projects and employees. Using Data mining, one can use this data to generate different reports like profits generated etc.

4.What are various features of Data Mining?

Answer:

Following are different features of data mining :

• Automatic pattern predictions based on trend and behaviour analysis.

• Prediction based on likely outcomes.

• Creation of decision-oriented information.

• Focus on large data sets and databases for analysis.

• Clustering based on finding and visually documented groups of facts not previously known.

5.Explain data purging?(100 % asked Data Mining Interview Questions )

Answer:

The process of cleaning junk data is termed as data purging. Purging data would mean getting rid of unnecessary NULL values of columns. This usually happens when the size of the database gets too large.

Data Purging is most important activity for database management systems. The junk data will grab the database memory and it will slows down the performance of the database. So frequent purging gives the fast performance of data.

6.Explain different types of Storage models in OLAP?(100 % asked Data Mining Interview Questions )

Answer :

The following are different types of storage models in OLAP :

1. MOLAP – Multidimensional Online Analytical Processing
2. ROLAP – Relational online Analytical processing
3. HOLAP – Hybrid online Analytical Processing

7.Explain MOLAP (Multidimensional Online Analytical Processing) with its Advantages and disadvantages?

Answer:

1. As the name itself depicts “MOLAP” , i.e. Multidimensional.

2. In this type of data storage, the data is stored in multidimensional cubes and not in the standard relational databases.

The advantage of using MOLAP is:

The query performance is excellent, this is because the data is stored in multidimensional cubes. Also the calculations are pre generated when a cube is created.

The disadvantage of using MOLAP is:

1. Only limited amount of data can be stored. Since the calculations are triggered at the cube generation process it cannot withstand huge amount of data.
2. Needs a lot of skill to utilize this.
3. Also it has licensing cost associated to it.

8.What Are Cubes?

Answer :

A data cube stores data in a summarized version which helps in a faster analysis of data. The data is stored in such a way that it allows reporting easily.

Example :

using a data cube A user may want to analyze weekly, monthly performance of an employee. Here, month and week could be considered as the dimensions of the cube.

9.What is OLAP ? Explain with example.

Answer:

OLAP is technology used in many Business Intelligence applications which includes complex analytical calculations.OLAP is used for complex calculations,Trends Analysis,sophisticated data modeling.OLAP database is stored in multidimensional database model.OLAP system contains less number of transactions but complex calculations like aggregation- Sum,count,average,min,max e.t.c.

The Aggregated data in OLAP system must be in months,quarters,years,weeks e.t.c. The key purpose to use OLAP system is to reduce the query response time and increase the effectiveness of reporting.If these aggregated calculations are already stored in repository and if user wants fast access of data then user can use OLAP system.OLAP database stores aggregated historical data in multidimensional schema.

Real Example :

If Company head wants information of Resources salary in year 2000.

In spite of using the transactional system we will use OLAP system here where aggregated data of year 2000 for Resources is already present.

10.What is OLTP transaction?Explain with example.

Answer:

OLTP system is known as large number of small daily transactions like insert,update and delete.Operational database is known as OLTP system.OLTP system provides fast query processing as well as it is also responsible to provide data integrity and data consistency.The actual effectiveness of OLTP is measured in number of Transactions per second.OLTP normally contains current data and data normalization is used properly in OLTP system.

Real Example :

If Company head wants transactional report of all Employees In – Out time.

As Company head wants daily report of in-out time we need to provide it using OLTP system.We need to schedule report on daily basis using OLTP system.

11.What is difference between data warehouse and data mining?(100 % asked Data Mining Interview Questions )

Answer :

Data Warehousing:
It is a process where the data is extracted from various sources. Further, the data is cleansed and stored.

Data Mining:

1. It is a process where it explores the data using the queries.
2. Basically, the queries are used to explore a particular data set and examine the results. This will help the individual in reporting, strategy planning, visualizing meaningful data sets.

The above can be explained by taking a simple example:

1. Let’s take a software company where all of their projects information is stored. This is nothing but Data Warehousing.
2. Accessing a particular project and identifying the Profit and Loss statement for that project can be considered as Data Mining.

12.What are different data mining techniques?

Answer:

  1. Decision Trees: It’s the most common technique used for data mining because of its simplest structure. The root of decision tree act as a condition or question with multiple answers. Each answer leads to specific data that help us to determine final decision based upon it.
  2. Sequential Patterns: The pattern analysis used to discover regular events, similar patterns in transaction data. Like, in sales; the historical data of customers helps us to identify the past transactions in a year. Based on the historic purchasing frequency of customer, the best deals or offers have been introduced by business firms.
  3. Clustering: Using the automatic method, cluster of objects is formed having similar characteristics. By using clustering, classes are defined and then suitable objects are placed in each class.
  4. Prediction: This method discovers the relationship between independent and dependent instances. For example, in the area of sales; to predict the future profit, sale acts as independent instance and profit could be dependent. Then based on historical data of sales and profit, associated profit is predicted.
  5. Association: Also called relation technique, in this a pattern is recognized based upon the relationship of items in a single transaction. It is suggested technique for market basket analysis to explore the products that customer frequently demands.
  6. Classification: Based upon machine learning, used to classify each item in a particular set into predefined groups. This method adopts mathematical techniques such as neural networks, linear programming, and decision trees and so on.

13.What is ROLAP?Explain with advantages and disadvantages.

Answer :
As the name suggests that, the data is stored in the form of relational databases.

The advantages of using ROLAP is:

1. As the data is stored in relational databases, it can handle huge amount of data storage.
2. All the functionalities are available as this is a relational database.

The disadvantages of using ROLAP is:

1. It is comparatively slow.
2. All the limitations that apply to SQL , the same applies to ROLAP too.

14.Explain different usages of data mining.

Answer :

Following are some usages of data mining :

1.Fast Business Decisions :

Data mining helps analysts in making faster business decisions which increases revenue with lower costs.

2.Find Patterns:

Data mining helps to understand, explore and identify patterns of data.

3.Process automation:

Data mining automates process of finding predictive information in large databases.

4.Hidden Pattern Finding :

Helps to identify previously hidden patterns.

15.Tell different industries where  data mining is frequently used?

Answer:

Following are different industries where data mining is frequently used :

1. Marketing
2. Advertising
3. Services
4. Artificial Intelligence
5. Government intelligence

16.Explain data mining examples. (at least 2 examples of data mining)

Answer :

The data mining is used in various industries.Following are two examples of data mining :

1.Mobile Service Providers :

The Mobile service providers uses huge data mining to collect customer data.Mobile phone and utilities companies use Data Mining and Business Intelligence to predict ‘churn’, the terms they use for when a customer leaves their company to get their phone/gas/broadband from another provider. They collate billing information, customer services interactions, website visits and other metrics to give each customer a probability score, then target offers and incentives to customers whom they perceive to be at a higher risk of churning.

2.Analytics Websites like Trivago :

There are different analytics websites which will compare the prices of different things from other website. The Analytics and data mining plays big role in that websites. If you check the website named Trivago which will gives the information of different hotel prices by comparing the different websites uses the predictive data mining technique which will mine the data from different websites and shows the results.

17.What are different stages of data mining ?(100 % asked Data Mining Interview Questions )

Answer :

There are following different stages of data mining :

a. Business understanding

b. Data understanding

c. Data preparation

d. Modeling

e. Evaluation

f. Deployment

18. Explain different stages of data mining?(100 % asked Data Mining Interview Questions )

Answer:

Stage 1 : Exploration

Exploration is a stage where a lot of activities revolve around preparation and collection of different data sets. So activities like cleaning, transformation are also included. Based on the data sets available , different tools are necessary to analyze the data.

Stage 2 : Model Building and validation
In this stage, the data sets is validated by applying different models where the data sets are compared for best performance. This particular step is called as pattern identification. This is a tedious process because the user has to identify which pattern is best suitable for easy predictions.

Stage 3 : Deployment:
Based on the previous step, the best pattern is applied for the data sets and it is used to generate predictions and it helps in estimating expected outcomes.

19.Explain Decision tree algorithm?

Answer :

A decision tree is a tree in which every node is either a leaf node or a decision node. This tree takes an input an object and outputs some decision. All Paths from root node to the leaf node are reached by either using AND or OR or BOTH. The tree is constructed using the regularities of the data. The decision tree is not affected by Automatic Data Preparation.

20.Why data warehouse tuning is needed? Explain.

Answer:

Performance tuning in data warehouse is needed because of its huge data.The data warehouse has very huge historical as well as current data.Its very difficult to fetch the specific pattern information within a specified time.The main aspect of data warehouse is that the data evolves based on the time frame and it is difficult to predict the behavior because of its ad hoc environment. The database tuning is much difficult in an OLTP environment because of its ad hoc and real time transaction loads. Due to its nature, the need to data warehouse tuning is necessary and it will change the way how the data is utilized based on the need.

21.What is cluster analysis in Data Mining?(100 % asked Data Mining Interview Questions )

Answer :

Clustering analysis is used to group sets of data with similar characteristics also called as clusters. These clusters help in making faster decisions, and exploring data. The algorithm first identifies relationships in a data-set following which it generates a series of clusters based on the relationships. The process of creating clusters is iterative. The algorithm redefines the groupings to create clusters that better represent the data.

22.How data warehouse and data mining work together?

Answer:

Following points gives you idea about the data warehouse and data mining relationship :

Data mining

1.Extracting useful information for large amounts of data, for the purpose of finding various methods for business intelligence. This is the process of data mining
2.Prediction of future is done by using data mining. Data warehousing is the source for data mining.

Data warehousing:

1.Extracting data from various resources, transforming into required form is done in data warehousing. Later this data is loaded into data warehouse.
2.Historical data is stored using data warehousing. Business analysis is done by business users.

23.What are different types of data mining?

Answer:

Following are different types of data mining :

a. Data cleaning

b. Integration

c. Selection

d. Data transformation

e. Data mining

f. Pattern evaluation

g. Knowledge representation

Data Mining Interview Questions

24.What is required technological drivers in data mining?

Answer:

Database size: Basically, as for maintaining and processing the huge amount of data, we need powerful systems.

Query Complexity: Generally, to analyze the complex and large number of queries, we need a more powerful system

25.What will be most common issues in data mining process ?

Answer:

There are following most common issues of data mining process :

A number of issues that need to be addressed by any serious data mining package

Uncertainty Handling

Dealing with Missing Values

Dealing with Noisy data

Efficiency of algorithms

Constraining Knowledge Discovered to only Useful

Incorporating Domain Knowledge

Size and Complexity of Data

Data Selection

Understandably of Discovered Knowledge: Consistency between Data and Discovered Knowledge.

26. Explain capabilities concept in Data Mining?

Answer :

Data mining is used to examine or explore the data using queries. These queries can be fired on the data warehouse. Explore the data in data mining helps in reporting, planning strategies, finding meaningful patterns etc. it is more commonly used to transform large amount of data into a meaningful form. Data here can be facts, numbers or any real time information like sales figures, cost, meta data etc. Information would be the patterns and the relationships amongst the data that can provide information.

27.Explain Data Aggregation and data Generalization?

Answer:

Data Aggregation:
As the name itself is self explanatory , the data is aggregated altogether where a cube can be constructed for data analysis purposes.

Generalization:
It is a process where low level data is replaced by high level concept so the data can be generalized and meaningful.

28.What are different level of analysis in data mining?

Answer:

a.Artificial neural network

b. Genetic algorithms

c. Nearest neighbor method

d. Rule induction

e Data visualization

29.What is machine learning ?

Answer:

Generally, it covers automatic computing procedures. Also, it was based on logical or binary operations. Further, we use to learn a task from a series of examples.

Here, we have to focus on decision-tree approaches. Also, ss classification results come from a sequence of logical steps.

Also, its principle would allow us to deal with more general types of data including cases. While, the number and type of attributes may vary.

30.What is Sting?

Answer :Statistical Information Grid is called as STING; it is a grid based multi resolution clustering method. In STING method, all the objects are contained into rectangular cells, these cells are kept into various levels of resolutions and these levels are arranged in a hierarchical structure.

These are above some important Data Mining Interview Questions. Hope you like this article on Data mining interview questions. Please comment in to comment section if you have any suggestions.