Big Data

Posts

Showing posts from June, 2016

Working with HDFS - Hadoop Distributed file system

Architecture: The concept of HDFS is to split the file into blocks and save multiple copies of the blocks on different nodes. The advantage of doing this is multiple operations can be done on the blocks and later the results can be aggregated , also having multiple copies will address the fault tolerance. HDFS will have the following nodes 1.) Name node - This will store the metadata of the file like how many blocks the file is split into and which nodes have this copies. This will also decide how to split the file and where to store these blocks 2.) Secondary Name node - This is for the fail over of the Name node. 3.) Data Node - These can be any in number depending on the requirement. These nodes will actually store the data . These nodes will send heart beat to the Name node periodically. In a clustered hadoop environment , a node would be on a physical machine. In a pseudo cluster environment ( like the cloudera VM) all the nodes would run on the same...

Cloudera setup

Installing Cloudera is a best way to kick start the cloud setup. Follow the below steps to setup Cloudera on your windows machine: 1) Download VMware player to open cloudera machine from your windows machine link : https://www.vmware.com/products/player/playerpro-evaluation.html Install VMWare player. 2.) Download the Cloudera VM. Do the signup and stuff required to download cloudera VM. link : http://www.cloudera.com/downloads.html Click on quick starts from the above link , select the latest version and VMWare and click on download. Approximately 5GB of data would be downloaded. So sit back and relax . Upon completion of Clodera VM download , extract the downloaded zip file to a convenient location. Launching the VM 1.) Open the VMWare player and click on open a virtual machine . Open the VM from the path where you have extracted the ClouderaVM . ...