In my previous article I have given the Snowflake Schema and Star schema with real industry examples. The current world is of Big data technology. So I would like to start with Big data and Hadoop Framework with Diagram. I would like to start with What exactly the Hadoop is? which will give you idea about Hadoop and then I will explain about Hadoop Framework with its architecture diagram and explanation of each and every components of Hadoop.
What is Hadoop?
In this section I would like to give the information about what is Hadoop and how it is used?
Hadoop is a software tool that uses a network of many computers to handle problems requiring a significant quantity of computation and data. Because the data can be structured or unstructured, it offers more flexibility for gathering, processing, analyzing, and managing data. It has an open-source distributed framework for the big data application’s distributed storage, management, and processing in scalable server clusters.
Big Data, or unstructured, structured, and semi-structured data volumes of exceptionally high sizes, can be processed, handled, and combined using Hadoop. Big Data has a solution in Hadoop. Hadoop seeks to take advantage of the opportunities offered by big data while overcoming any obstacles. It is a Java-based open-source programming framework that controls how big data sets are processed in a distributed or clustered setting.
What is Hadoop Framework with its type?
There are four frameworks in Hadoop.
- Map Reduce
- YARN Framework
- Common Utilities
1.Hadoop Framework- Map and Reduce :
The issue of processing huge data is resolved. Let’s solve this practical issue in order to get the idea of map reductions. The ABC Company seeks to determine its overall city sales. The data is in terabytes, thus the Map-Reduce technique won’t work due of the hash table concept.
Two stages are involved,
a) Map: Based on the key/value pair, we will first divide the data into smaller portions
known as mappers. Thus, the name of the city will serve as the key, and total sales will
serve as the value. Every month’s data, which includes a city’s name and related sales,
will be provided to each mapper.
b) Reduce: It will obtain these data stacks, and each reducer will be in charge of the
cities in the North, West, East, and South.
2.Hadoop Distributed File Systems (HDFS) :
The failure tolerance of the Hadoop Distributed File System (HDFS) is built-in. Individual data blocks are stored in three different copies on various nodes and server racks. The whole system is barely affected if a node or even an entire rack fails.
While NameNodes oversee the numerous DataNodes, keep track of data block metadata, and govern client access, DataNodes process and store data blocks.
3.YARN Framework :
Map Reduce and HDFS were the only two components in the first iteration of Hadoop. Later, it was discovered that Map Reduce was unable to handle several huge data issues. The plan was to replace the outdated map-reduce engine with a new component that would be in charge of resource management and work scheduling. Thus, this is how YARN came to be. It serves as the intermediary layer between HDFS and Map Reduce and is in charge of overseeing the cluster’s resources.
Hadoop relied on MapReduce to process large datasets, hence it had a few drawbacks including scalability concerns, batch processing delays, etc. despite being quite effective at data processing and computations. Hadoop can now support a variety of processing philosophies and has a wider range of applications Hadoop YARN clusters may now stream interactive querying and run batch MapReduce tasks alongside data processing. The YARN framework fixes the issues with Hadoop 1.0 by running on applications other than MapReduce.
4.Common Utilities :
Basically, Common Utilities is also called Hadoop common. These are simply the JAVA libraries, files, scripts, and utilities that the other Hadoop components actually need to function.
I hope with this article you got the high level idea about Hadoop Framework and its types. In upcoming articles I would like to give more information in detailed level about these four types of Hadoop Framework also would like to throw light on the Usages of Hadoop and where you cant use Hadoop. If you like this article or if you have any issues with the same kindly comment in comments section.