Use Case :
A super market would record the sales in a file . Whenever an item is sold , the name of item , number of units of sale and cost of each unit in a comma separated format.
A sample file would look like below
Apple,10,10
Mango,20,5
Guava,10,3
Banana,30,4
Apple,10,5
At the end of the day we are required to find the total sales per each item.
Expected Output :
Apple 150
Mango 100
Guava 30
Banana 120
Implementing in HIVE
Getting started with HIVE:
Open a terminal and type hive , this will open the hive shell
Create and use sales database :
Create database :
create database salesdb;
Use the database :
use salesdb;
Create sales table:
CREATE TABLE ITEM_SALES_RECORD (
ITEM_NAME string,
UNITS int,
UNIT_COST decimal)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ","
LINES TERMINATED BY "\n";
Insert data into table from file:
Use the java file to generate data
https://github.com/nachiketagudi/Map-Reduce/blob/master/GenerateData.java
an input file will be generated at /home/cloudera/input.txt
We can load the data into table in two ways , by keeping the file in the local file system or to push this file to hdfs.
Loading from local file system :
load data local inpath '/home/cloudera/input.txt' into table item_sales_record;
Loading from hdfs :
push the file to hdfs - hadoop fs -put input.txt
load data inpath '/home/cloudera/input.txt' into table item_sales_record;
Obtaining the output:
Use the below query just like yout SQL query to obtain the desired output from table :
SELECT ITEM_NAME,SUM(UNITS*UNIT_COST) FROM ITEM_SALES_RECORD GROUP BY ITEM_NAME;
This will launch map-reduce jobs and provide us the output.
A super market would record the sales in a file . Whenever an item is sold , the name of item , number of units of sale and cost of each unit in a comma separated format.
A sample file would look like below
Apple,10,10
Mango,20,5
Guava,10,3
Banana,30,4
Apple,10,5
At the end of the day we are required to find the total sales per each item.
Expected Output :
Apple 150
Mango 100
Guava 30
Banana 120
Implementing in HIVE
Getting started with HIVE:
Open a terminal and type hive , this will open the hive shell
Create and use sales database :
Create database :
create database salesdb;
Use the database :
use salesdb;
Create sales table:
CREATE TABLE ITEM_SALES_RECORD (
ITEM_NAME string,
UNITS int,
UNIT_COST decimal)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ","
LINES TERMINATED BY "\n";
NOTE : Table names and column names are not case sensitive.
Use the java file to generate data
https://github.com/nachiketagudi/Map-Reduce/blob/master/GenerateData.java
an input file will be generated at /home/cloudera/input.txt
We can load the data into table in two ways , by keeping the file in the local file system or to push this file to hdfs.
Loading from local file system :
load data local inpath '/home/cloudera/input.txt' into table item_sales_record;
Loading from hdfs :
push the file to hdfs - hadoop fs -put input.txt
load data inpath '/home/cloudera/input.txt' into table item_sales_record;
Obtaining the output:
Use the below query just like yout SQL query to obtain the desired output from table :
SELECT ITEM_NAME,SUM(UNITS*UNIT_COST) FROM ITEM_SALES_RECORD GROUP BY ITEM_NAME;
This will launch map-reduce jobs and provide us the output.
Comments
Post a Comment