Skip to main content

Posts

Showing posts from 2017

Apache Spark , the end of MapReduce and Hadoop?

Is Apache Spark the end of MapReduce?  All the previous articles have discussed about the Hadoop framework and the way it performs actions on data through MapReduce programs.However, there are some drawbacks to the MapReduce framework.Let us analyze the key differences between MapReduce and Apache Spark in a detailed fashion. MapReduce runs in three phases. 1)The mapper program is fed data from HDFS and corresponding metadata is fetched and map operation is applied on the data. 2)The temporary data is written into local filesystem instead of HDFS for the reducer to apply operation on the key value pairs returned by Mapper. 3)The reducer picks data from the local file system(output of Mapper) and writes the output back to the HDFS systems. However there are inherent problems in this regard.Imagine we have a case where the MapReduceprogram fails due to some network/read/write error and the output is not output.This might not be a bg issue incase the data read is sma...

Let us 'Sqoop' it ! .

SQOOP - The bridge between traditional and novel big data systems. By now,we have seen articles about MapReduce programs to write programs using Hadoop MapReduce framework.However, all the operations were actually performed on sample text files. Here comes Sqoop to our rescue.Apache Sqoop is a tool developed to fetch/put data from traditional SQL based data storage systems like MySQL,PostgreSQL,MS SQL Server.Sqoop can also be used to fetch/push data from NoSQL systems too.This versatility is because of Sqoop's architecture abstracted on MapReduce framework. Sqoop has an extension framework that makes it possible from and to any external storage system that has bulk data transfer capabilities.A Sqoop connector is a modular component that uses this framework to enable Sqoop imports and exports.Sqoop comes with connectors for working with a range of versatile popular databases including MySQL,PostgreSQL,Oracle,SQL Server,DB2 and Netezza.Apart from the above connectors Sqoop als...