Launching Map Reduce Job

Use Case :

A super market would record the sales in a file . Whenever an item is sold , the name of item , number of units of sale and cost of each unit in a comma separated format.

A sample file would look like below

Apple,10,10
Mango,20,5
Guava,10,3
Banana,30,4
Apple,10,5

At the end of the day we are required to find the total sales per each item.

Expected Output :

Apple 150
Mango 100
Guava 30
Banana 120

Input file :

Download the input file from the below location

https://github.com/nachiketagudi/Map-Reduce/blob/master/SampleInput.txt

place this file at /home/cloudera

Map Reduce Executable File :

Download the java file from the below location

https://github.com/nachiketagudi/Map-Reduce/blob/master/SalesPerItem.java

In the Cloudera VM Open eclipse and create a new java project and name it MapReducePractice. Create a new java class and name it SalesPerItem.java and the package name "com.nachiketa.mapreduce.example".

After creating the file , copy the contents of the above downloaded file here.

Set up the class path :

To setup the class path follow below steps

Right click on the project MapReducePractice and go to >> Properties >> Java Build Path

Click on Add External jars and navigate to /usr/lib/hadoop/client and select all the jars and click Ok to add them . Click ok to exit . The compilation error's should be disappeared by now.

Export the jar:

To export the jar of the created project , follow below steps

Right click on the project MapReducePractice and go to >> Export >>Jar File

In select the export destination , give the jar file name with complete path as

/home/cloudera/MapReducePractice.jar

Setting up the input paths:

Use below commands to setup input and output paths

hadoop fs -mkdir /user/cloudera/mapreduce_input

hadoop fs -put /home/cloudera/SampleInput.txt

Launch the mapreduce job:

go to the path where MapReducePractice.jar is exported i.e., /home/cloudera/ and run the below command

hadoop jar <jar name> <Class name with complete package> <input path> <output path>

hadoop jar MapReducePractice.jar com.nachiketa.mapreduce.example.SalesPerItem /user/cloudera/mapreduce_input /user/cloudera/mapreduce_output

Output :

hadoop fs -cat /user/cloudera/mapreduce_output/part-00000

run this command to check the output .

To re run this job remove the mapreduce_output folder and run the hadoop jar command mentioned above.

hadoop fs -rm -R /user/cloudera/mapreduce_output

Big Data

Search This Blog

Launching Map Reduce Job

Comments

Post a Comment

Popular posts from this blog

Let us 'Sqoop' it ! .

Cloudera setup

Hive Example