Hive Example

Use Case :

A super market would record the sales in a file . Whenever an item is sold , the name of item , number of units of sale and cost of each unit in a comma separated format.

A sample file would look like below

Apple,10,10
Mango,20,5
Guava,10,3
Banana,30,4
Apple,10,5

At the end of the day we are required to find the total sales per each item.

Expected Output :

Apple 150
Mango 100
Guava 30
Banana 120

Implementing in HIVE

Getting started with HIVE:

Open a terminal and type hive , this will open the hive shell

Create and use sales database :

Create database :

create database salesdb;

Use the database :

use salesdb;

Create sales table:

CREATE TABLE ITEM_SALES_RECORD (
ITEM_NAME string,
UNITS int,
UNIT_COST decimal)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ","

LINES TERMINATED BY "\n";

NOTE : Table names and column names are not case sensitive.

Insert data into table from file:

Use the java file to generate data

https://github.com/nachiketagudi/Map-Reduce/blob/master/GenerateData.java

an input file will be generated at /home/cloudera/input.txt

We can load the data into table in two ways , by keeping the file in the local file system or to push this file to hdfs.

Loading from local file system :

load data local inpath '/home/cloudera/input.txt' into table item_sales_record;

Loading from hdfs :

push the file to hdfs - hadoop fs -put input.txt

load data inpath '/home/cloudera/input.txt' into table item_sales_record;

Obtaining the output:

Use the below query just like yout SQL query to obtain the desired output from table :

SELECT ITEM_NAME,SUM(UNITS*UNIT_COST) FROM ITEM_SALES_RECORD GROUP BY ITEM_NAME;

This will launch map-reduce jobs and provide us the output.

Big Data

Search This Blog

Hive Example

Comments

Post a Comment

Popular posts from this blog

Let us 'Sqoop' it ! .

Cloudera setup