Skip to main content

Launching Map Reduce Job

Use Case :

A super market would record the sales in a file . Whenever an item is sold , the name of item , number of units of sale and cost of each unit in a comma separated format.

A sample file would look like below

Apple,10,10
Mango,20,5
Guava,10,3
Banana,30,4
Apple,10,5

At the end of the day we are required to find  the total sales per each item.

Expected Output :

Apple 150
Mango 100
Guava 30
Banana 120

Input file :

Download the input file from the below location

https://github.com/nachiketagudi/Map-Reduce/blob/master/SampleInput.txt

place this file at /home/cloudera

Map Reduce Executable File :

Download the java file from the below location

https://github.com/nachiketagudi/Map-Reduce/blob/master/SalesPerItem.java


In the Cloudera VM Open eclipse and create a new java project and name it MapReducePractice. Create a new java class and name it SalesPerItem.java and the package name "com.nachiketa.mapreduce.example".

After creating the file , copy the contents of the above downloaded file here.

Set up the class path :

To setup the class path follow below steps

Right click on the project MapReducePractice and go to >> Properties >> Java Build Path

Click on Add External jars and navigate to /usr/lib/hadoop/client and select all the jars and click Ok to add them . Click ok to exit . The compilation error's should be disappeared by now.

Export the jar:

To export the jar of the created project , follow below steps

Right click on the project MapReducePractice and go to >> Export >>Jar File

In select the export destination , give the jar file name with complete path as

/home/cloudera/MapReducePractice.jar

Setting up the input paths:

Use below commands to setup input and output paths

hadoop fs -mkdir /user/cloudera/mapreduce_input

hadoop fs -put /home/cloudera/SampleInput.txt

Launch the mapreduce job:

go to the path where MapReducePractice.jar is exported i.e., /home/cloudera/ and run the below command

hadoop jar <jar name> <Class name with complete package> <input path> <output path>

hadoop jar MapReducePractice.jar com.nachiketa.mapreduce.example.SalesPerItem /user/cloudera/mapreduce_input /user/cloudera/mapreduce_output

Output :

hadoop fs -cat /user/cloudera/mapreduce_output/part-00000

run this command to check the output .


To re run this job remove the mapreduce_output folder and run the hadoop jar command mentioned above.

hadoop fs -rm -R /user/cloudera/mapreduce_output

Comments

Popular posts from this blog

Let us 'Sqoop' it ! .

SQOOP - The bridge between traditional and novel big data systems. By now,we have seen articles about MapReduce programs to write programs using Hadoop MapReduce framework.However, all the operations were actually performed on sample text files. Here comes Sqoop to our rescue.Apache Sqoop is a tool developed to fetch/put data from traditional SQL based data storage systems like MySQL,PostgreSQL,MS SQL Server.Sqoop can also be used to fetch/push data from NoSQL systems too.This versatility is because of Sqoop's architecture abstracted on MapReduce framework. Sqoop has an extension framework that makes it possible from and to any external storage system that has bulk data transfer capabilities.A Sqoop connector is a modular component that uses this framework to enable Sqoop imports and exports.Sqoop comes with connectors for working with a range of versatile popular databases including MySQL,PostgreSQL,Oracle,SQL Server,DB2 and Netezza.Apart from the above connectors Sqoop als...

Cloudera setup

Installing Cloudera is a best way to kick start the cloud setup. Follow the below steps to setup Cloudera on your windows machine: 1) Download VMware player to open cloudera machine from your windows machine link :  https://www.vmware.com/products/player/playerpro-evaluation.html Install VMWare player. 2.) Download the Cloudera VM. Do the signup and stuff required to download cloudera VM. link :  http://www.cloudera.com/downloads.html Click on quick starts from the above link , select the latest version and VMWare and click on download. Approximately 5GB of data would be downloaded. So sit back and relax . Upon completion of Clodera VM download , extract the downloaded zip file to a convenient location. Launching the VM 1.) Open the VMWare player and click on open a virtual machine . Open the VM from the path where you have extracted the ClouderaVM .                               ...

Hive Example

Use Case : A super market would record the sales in a file . Whenever an item is sold , the name of item , number of units of sale and cost of each unit in a comma separated format. A sample file would look like below Apple,10,10 Mango,20,5 Guava,10,3 Banana,30,4 Apple,10,5 At the end of the day we are required to find  the total sales per each item. Expected Output : Apple 150 Mango 100 Guava 30 Banana 120 Implementing in HIVE Getting started with HIVE: Open a terminal and type hive , this will open the hive shell Create and use sales database : Create database : create database salesdb; Use the database : use salesdb; Create sales table: CREATE TABLE ITEM_SALES_RECORD ( ITEM_NAME string, UNITS int, UNIT_COST decimal)  ROW FORMAT DELIMITED  FIELDS TERMINATED BY ","  LINES TERMINATED BY "\n"; NOTE : Table names and column names are not case sensitive. Insert data into table from file: Use the java file to gener...