Use Case :
A super market would record the sales in a file . Whenever an item is sold , the name of item , number of units of sale and cost of each unit in a comma separated format.
A sample file would look like below
Apple,10,10
Mango,20,5
Guava,10,3
Banana,30,4
Apple,10,5
At the end of the day we are required to find the total sales per each item.
Expected Output :
Apple 150
Mango 100
Guava 30
Banana 120
Input file :
Download the input file from the below location
https://github.com/nachiketagudi/Map-Reduce/blob/master/SampleInput.txt
place this file at /home/cloudera
Map Reduce Executable File :
Download the java file from the below location
https://github.com/nachiketagudi/Map-Reduce/blob/master/SalesPerItem.java
In the Cloudera VM Open eclipse and create a new java project and name it MapReducePractice. Create a new java class and name it SalesPerItem.java and the package name "com.nachiketa.mapreduce.example".
After creating the file , copy the contents of the above downloaded file here.
Set up the class path :
To setup the class path follow below steps
Right click on the project MapReducePractice and go to >> Properties >> Java Build Path
Click on Add External jars and navigate to /usr/lib/hadoop/client and select all the jars and click Ok to add them . Click ok to exit . The compilation error's should be disappeared by now.
Export the jar:
To export the jar of the created project , follow below steps
Right click on the project MapReducePractice and go to >> Export >>Jar File
In select the export destination , give the jar file name with complete path as
/home/cloudera/MapReducePractice.jar
Setting up the input paths:
Use below commands to setup input and output paths
hadoop fs -mkdir /user/cloudera/mapreduce_input
hadoop fs -put /home/cloudera/SampleInput.txt
Launch the mapreduce job:
go to the path where MapReducePractice.jar is exported i.e., /home/cloudera/ and run the below command
hadoop jar <jar name> <Class name with complete package> <input path> <output path>
hadoop jar MapReducePractice.jar com.nachiketa.mapreduce.example.SalesPerItem /user/cloudera/mapreduce_input /user/cloudera/mapreduce_output
To re run this job remove the mapreduce_output folder and run the hadoop jar command mentioned above.
hadoop fs -rm -R /user/cloudera/mapreduce_output
A super market would record the sales in a file . Whenever an item is sold , the name of item , number of units of sale and cost of each unit in a comma separated format.
A sample file would look like below
Apple,10,10
Mango,20,5
Guava,10,3
Banana,30,4
Apple,10,5
At the end of the day we are required to find the total sales per each item.
Expected Output :
Apple 150
Mango 100
Guava 30
Banana 120
Input file :
Download the input file from the below location
https://github.com/nachiketagudi/Map-Reduce/blob/master/SampleInput.txt
place this file at /home/cloudera
Map Reduce Executable File :
Download the java file from the below location
https://github.com/nachiketagudi/Map-Reduce/blob/master/SalesPerItem.java
In the Cloudera VM Open eclipse and create a new java project and name it MapReducePractice. Create a new java class and name it SalesPerItem.java and the package name "com.nachiketa.mapreduce.example".
After creating the file , copy the contents of the above downloaded file here.
Set up the class path :
To setup the class path follow below steps
Right click on the project MapReducePractice and go to >> Properties >> Java Build Path
Click on Add External jars and navigate to /usr/lib/hadoop/client and select all the jars and click Ok to add them . Click ok to exit . The compilation error's should be disappeared by now.
Export the jar:
To export the jar of the created project , follow below steps
Right click on the project MapReducePractice and go to >> Export >>Jar File
In select the export destination , give the jar file name with complete path as
/home/cloudera/MapReducePractice.jar
Setting up the input paths:
Use below commands to setup input and output paths
hadoop fs -mkdir /user/cloudera/mapreduce_input
hadoop fs -put /home/cloudera/SampleInput.txt
Launch the mapreduce job:
go to the path where MapReducePractice.jar is exported i.e., /home/cloudera/ and run the below command
hadoop jar <jar name> <Class name with complete package> <input path> <output path>
hadoop jar MapReducePractice.jar com.nachiketa.mapreduce.example.SalesPerItem /user/cloudera/mapreduce_input /user/cloudera/mapreduce_output
Output :
hadoop fs -cat /user/cloudera/mapreduce_output/part-00000
run this command to check the output .
To re run this job remove the mapreduce_output folder and run the hadoop jar command mentioned above.
hadoop fs -rm -R /user/cloudera/mapreduce_output
Comments
Post a Comment