Working with HDFS - Hadoop Distributed file system

Architecture:

The concept of HDFS is to split the file into blocks and save multiple copies of the blocks on different nodes. The advantage of doing this is multiple operations can be done on the blocks and later the results can be aggregated , also having multiple copies will address the fault tolerance.

HDFS will have the following nodes

1.) Name node - This will store the metadata of the file like how many blocks the file is split into and which nodes have this copies. This will also decide how to split the file and where to store these blocks

2.) Secondary Name node - This is for the fail over of the Name node.

3.) Data Node - These can be any in number depending on the requirement. These nodes will actually store the data . These nodes will send heart beat to the Name node periodically.

In a clustered hadoop environment , a node would be on a physical machine. In a pseudo cluster environment ( like the cloudera VM) all the nodes would run on the same physical machine.

How to start & stop the hdfs services:

On launching the Cloudera VM , all the hadoop services would be started by default. In case you are not able to access HDFS , use the below commands to start HDFS services.

Start HDFS services :

To start HDFS: On the NameNode: sudo service hadoop-hdfs-namenode start

On the Secondary NameNode (if used): sudo service hadoop-hdfs-secondarynamenode start

On each DataNode: sudo service hadoop-hdfs-datanode start

In case you want to stop the hdfs services , use the below commands

Stop HDFS services :

To stop HDFS: On the NameNode: sudo service hadoop-hdfs-namenode stop

On the Secondary NameNode (if used): sudo service hadoop-hdfs-secondarynamenode stop

On each DataNode: sudo service hadoop-hdfs-datanode stop

Below are the commands to work with hdfs :

1.) To list the files in hdfs

hadoop fs -ls /

hadoop fs -ls /user

hadoop fs -ls /user/cloudera

2.) To put a file in hdfs :

create a file testhdfs.txt in the local file system

gedit testhdfs.txt

enter some random data in this file and save the file

hadoop fs -put testhdfs.txt /user/cloudera

This will put the file to the hdfs from the local file system

hadoop fs -ls /user/cloudera

This will list the file which was added

3.) To view the file in hdfs :

hadoop fs -cat /user/cloudera/testhdfs.txt

hadoop fs -tail /user/cloudera/testhdfs.txt

4.) To change the file permissions

hadoop fs -chmod 777 /user/cloudera/testhdfs.txt

5.) To remove the file from hdfs

hadoop fs -rm /user/cloudera/testhdfs.txt

Note : The file will still be available in your local file system.

File Explorer :

Hadoop provides a browser based file explorer. This helps us to browse through the hadoop file system without hadoop commands. Below is the URL for hadoop file explorer.

http://localhost:50070/explorer.html

Lot of things related to the blocks size , storage locations , data replication can be explored from the file explorer.

Change Replication and Block size :

Below file will have the configurations related to hadoop

/usr/lib/hadoop/etc/hadoop/hdfs-site.xml

or

/etc/hadoop/conf/hdfs-site.xml

Following properties can be altered -

- Block size: (For 256 MB blocks)

<property>
<name>dfs.block.size</name>
<value>268435456</value>
</property>

- Replication:

<property>
<name>dfs.replication</name>
<value>2</value>
</property>

If above properties are not existing in the file you can add them.

Comments

Udit NakhatJune 9, 2016 at 5:20 AM
Very good explanation for beginners.
ReplyDelete
Replies

Add comment

Big Data

Search This Blog

Working with HDFS - Hadoop Distributed file system

Comments

Post a Comment

Popular posts from this blog

Let us 'Sqoop' it ! .

Cloudera setup

Hive Example