Architecture:
The concept of HDFS is to split the file into blocks and save multiple copies of the blocks on different nodes. The advantage of doing this is multiple operations can be done on the blocks and later the results can be aggregated , also having multiple copies will address the fault tolerance.
HDFS will have the following nodes
1.) Name node - This will store the metadata of the file like how many blocks the file is split into and which nodes have this copies. This will also decide how to split the file and where to store these blocks
2.) Secondary Name node - This is for the fail over of the Name node.
3.) Data Node - These can be any in number depending on the requirement. These nodes will actually store the data . These nodes will send heart beat to the Name node periodically.
In a clustered hadoop environment , a node would be on a physical machine. In a pseudo cluster environment ( like the cloudera VM) all the nodes would run on the same physical machine.
How to start & stop the hdfs services:
On launching the Cloudera VM , all the hadoop services would be started by default. In case you are not able to access HDFS , use the below commands to start HDFS services.
Start HDFS services :
To start HDFS: On the NameNode: sudo service
hadoop-hdfs-namenode start
On the Secondary NameNode (if used): sudo service
hadoop-hdfs-secondarynamenode start
On each DataNode: sudo service
hadoop-hdfs-datanode start
In case you want to stop the hdfs services , use the below commands
Stop HDFS services :
To stop HDFS: On the NameNode: sudo service
hadoop-hdfs-namenode stop
On the Secondary NameNode (if used): sudo service
hadoop-hdfs-secondarynamenode stop
On each DataNode: sudo service
hadoop-hdfs-datanode stop
Below are the commands to work with hdfs :
1.) To list the files in hdfs
hadoop fs -ls /
hadoop fs -ls /user
hadoop fs -ls /user/cloudera
2.) To put a file in hdfs :
create a file testhdfs.txt in the local file system
gedit testhdfs.txt
enter some random data in this file and save the file
hadoop fs -put testhdfs.txt /user/cloudera
This will put the file to the hdfs from the local file
system
hadoop fs -ls /user/cloudera
This will list the file which was added
3.) To view the file in hdfs :
hadoop fs -cat /user/cloudera/testhdfs.txt
hadoop fs
-tail /user/cloudera/testhdfs.txt
4.) To
change the file permissions
hadoop fs
-chmod 777 /user/cloudera/testhdfs.txt
5.) To
remove the file from hdfs
hadoop fs
-rm /user/cloudera/testhdfs.txt
Note : The
file will still be available in your local file system.
File Explorer :
Hadoop provides a browser based file explorer. This helps us to browse through the hadoop file system without hadoop commands. Below is the URL for hadoop file explorer.
http://localhost:50070/explorer.html
Lot of things related to the blocks size , storage locations , data replication can be explored from the file explorer.
Hadoop provides a browser based file explorer. This helps us to browse through the hadoop file system without hadoop commands. Below is the URL for hadoop file explorer.
http://localhost:50070/explorer.html
Lot of things related to the blocks size , storage locations , data replication can be explored from the file explorer.
Change Replication and Block size :
Below file will have the configurations related to hadoop
/usr/lib/hadoop/etc/hadoop/hdfs-site.xml
or
/etc/hadoop/conf/hdfs-site.xml
Following properties can be altered -
- Block size: (For 256 MB blocks)
<property>
<name>dfs.block.size</name>
<value>268435456</value>
</property>
- Replication:
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
If above properties are not existing in the file you can add them.
Very good explanation for beginners.
ReplyDelete