Run Your mapreduce code locally

In this blog we have explained in detail about how to run your mapreduce code locally in eclipse in any linux machine.

After reading this blog you can easily run your mapreduce codes in eclipse without starting any of your hadoop daemons.

Before getting started with the things let us learn something about local mode and cluster mode

Local Mode

Local mode means you are not connected to any other system or any other network,In local mode you need not to start your hadoop daemons also.You need not to store your files in hdfs,you can just specify your local file paths.

ClusterMode

Cluster is a collection of systems connected in a network ,cluster mode in the sense running your program in a distributed network which means a distributed collection of systems.Here you need to ensure that all your hadoop daemons are started and then you need to run your mapreduce application by building a jar file.

Running in clustermode is not recommended all the time because it wastes your HDFS space and decreases your cluster performance.Every time when you try to deploy your application in cluster mode,your hdfs takes atleast 128MB of spaces beacuse the default block size in Hadoop2.x is 128MB.

For Testing your MapReduce program you can deploy it in local mode rather than cluster mode.

Follow the below procedure to execute your Mapreduce programs locally in eclipse,this saves your hdfs memory and time to check your program

1.Open eclipse

2.Create a Java Project

3.Create a new package(optional)

4.Create a new class

5.Copy your program in to that class

You need to add dependencies for running in eclipse which means few more jars need to be configured in your libraries.

  • All the jars present in the lib folder of the common directory of hadoop.
  • Hadoop common 1.2.1 jar(Need to be imported externally)

To add the jar files

Right click on the project–>Build Path–>Configure Build Path–>Libraries–>Add External Jars–>open your hadoop folder–>share–>hadoop–>common–>lib–>

Add all the jars in lib folder

 

Then you need to add another external jar for dependencies i.e., hadoop-core-1.2.1 jar

Download that jar file from the below link

https://drive.google.com/file/d/0ByJLBTmJojjzM2IwU1FPdmExLUE/view?usp=sharing

After downloading you need to add this jar in to your libraries.

Now you are ready to run your program in eclipse,

To run

Right click on the project–>Run as–>Run configurations–>main


In main you need to select your project and main class correctly

 

Then move into the Arguments tab

Here you need to give your input file path and output file path separated by Tab space

Now click on Run then your program will start running and you can track the status in console

after the whole process you can see that an output file will be created in your specified folder.

Inside that folder you can see a part file and a success file which indicates that you have executed your program successfully in eclipse locally.

href=”https://s3.amazonaws.com/acadgildsite/wordpress_images/bigdatadeveloper/RUNNING+MAPREDUCE+IN+LOCAL+MODE/hadoop+eclipse.png”>

With this approach

  • you can test your MapReduce codes and make changes in the MapReduce code easily before deploying it in a cluster
  • you can save your HDFS space

 

Advertisements
Categories:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s