Troubleshoot MapReduce Jobs in Hadoop – Part 1

In this blog, we will discuss about troubleshooting common errors which may occur while executing a Java Map Reduce program.

We will be publishing a series of blogs where we will be discussing the ways to troubleshoot the frequently faced errors in MapReduce ,Pig and Hive.

In the first Blog of this series we will aim to address the common errors that one might encounter during execution of a MapReduce program and the  ways to handle those errors.

Let’s start our discussion with the different types of errors that may occur while executing a Java MapReduce program and ways to troubleshoot them.

1.Executing a MapReduce Program Without Starting the Hadoop Daemons

The first step during the execution of any Hadoop MapReduce program is to check if all the Hadoop daemons are active (started) or not. If not, the Hadoop daemons should be started and only then the program should be executed.

In the figure above, all the Hadoop daemons have been stopped purposefully, so that you can see the type of errors that occur when we execute the program.

In the figure above, we can notice that an error, “failed on connection exception” is displayed.

To handle this error, start all the Hadoop daemons first and then try executing the command. We can start all the Hadoop daemons using the start-all.sh command in the terminal.

Now, by using Jps command, we can see that all the Hadoop daemons have started successfully.

Now, run the word count program using wc_2.jar file.

From the figure above, it is clear that no exception is displayed after starting the Hadoop daemons.

After successful execution of the MapReduce program, we can check the output files in the output directory wc_output.

We now know how to troubleshoot the error that occurs when Hadoop program is run without starting the Hadoop daemons.

 

2. Not a Valid Jar

This type of error is displayed if the user doesn’t specify the correct path of the exported jar while running the MapReduce program.

The solution to the above exception is to create a valid jar file and to specify the correct path of the jar file while executing.

In the figure below, note the export destination of wc_2.jar file.

Now, run the wc_2.jar file in the command line by specifying the correct path where it is exported.

From the figure above, it is clear that no exception is displayed after specifying the correct jar file path in the execution command.

After the successful execution of the MapReduce program, we can check the output files in the output directory wc_output_1.

We have now learnt troubleshoot the error that occurs if the jar path is not valid.

3. Input Path Does Not Exist

This type of error will be displayed if the input file does not exist in the HDFS path and the user tries to execute it.

To troubleshoot this error, first save the input file in the HDFS path and then try running the MapReduce program.

From the figure above, note that the input file is saved in the HDFS path.

Now, run the same execution command and check the result.

From the figure above, it is clear that no exception is displayed after saving the input file in the HDFS path.

After the successful execution of the MapReduce program, we can check the output files in the output directory wc_output_2.

We now know how to troubleshoot the error that occurs if the input file path does not exist.

4. Output Path Already Exists

This type of error will be displayed if the output directory already exists in HDFS path.

The solution to this is to create a directory with different name that doesn’t match with any of the directory names and is already present in the HDFS path.

No exception is displayed after changing the output directory name in the HDFS path.

After the successful execution of the MapReduce program, we can check the output files in the output directory wc_output_3.

We now know how to troubleshoot the error that occurs when the user tries to store the latest output result into the already existing directory in HDFS path.

We hope this blog helped you in understanding common errors which may occur during execution of a java MapReduce program and how to troubleshoot them, we will be posting few more error types and the procedure to handle those errors in the next blog of this series.

Keep visiting our site for more updates on BigData and other technologies

Advertisements
Categories:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s