preview

Hadoop Distributed File System Analysis

Better Essays

HADOOP DISTRIBUTED FILE SYSTEM

Abstract - Hadoop Distributed File System, a Java based file system provides reliable and scalable storage for data. It is the key component to understand how a Hadoop cluster can be scaled over hundreds or thousands of nodes. The large amounts of data in Hadoop cluster is broken down to smaller blocks and distributed across small inexpensive servers using HDFS. Now, MapReduce functions are executed on these smaller blocks of data thus providing the scalability needed for big data processing. In this paper I will discuss in detail on Hadoop, the architecture of HDFS, how it functions and the advantages.

I. INTRODUCTION
Over the years it has become very essential to process large amounts of data with high precision and speed. This large amounts of data that can no more be processed using the Traditional Systems is called Big Data. Hadoop, a Linux based tools framework addresses three main problems faced when processing Big Data which the Traditional Systems cannot. The first problem is the speed of the data flow, the second is the size of the data and the last one is the format of data. Hadoop divides the data and computation into smaller pieces, sends it to different computers, then gathers the results to combine them and sends it to the application. This is done using Map Reduce and HDFS i.e., Hadoop Distributed File System. The data node and the name node part of the architecture fall under HDFS.

II. ARCHITECTURE
Hadoop works on

Get Access