Troubleshoot MapReduce Jobs in Hadoop – Part 2

In our first Blog of this series we discussed about troubleshooting common errors which may occur while executing a Java MapReduce program ( see Part 1 ). In this blog, we will be continuing with our aim to address the common errors that one might encounter during execution of a MapReduce program and the ways to handle those errors.

Let’s continue our discussion with the different types of errors that may occur while executing a Java MapReduce program and ways to troubleshoot them.

5.Inclusion/Exclusion of the Main Class Name in the Command Line While Executing the MapReduce Jar File

Scenario 1: Inclusion of the Main Class while exporting the jar.

This type of error will be displayed when the programmer explicitly includes the main class namein the command line, even if it has already been selected by the programmer while exporting the jar.

Refer to the screenshot below. The name of the main class ‘WordCount’ has been included in the command line.

An error is displayed stating “main” org.apache.hadoop.mapred.FileAlreadyExistsException

To avoid this kind of error, we have to exclude the main class name in the execution command for Mapreduce jar.

From the figure above, it is clear that no exception is displayed after excluding the main class name in the above command.

After the successful execution of the MapReduce program, we can check the output files in the output directory wc_output_5.

We have now learnt to troubleshoot the error that occurs when one explicitly tries to include themain class name in the command line, even after selecting the main class name while exporting the jar.

 

 

Scenario 2: Exclusion of the Main Class Name While Exporting the Jar

This type of error will be displayed when a programmer excludes the main class name in the command line and if the main class name is not included while exporting the jar.

Refer to the screenshot below. The name of the main class ‘WordCount’ has been excluded in the command line.

In the figure above, it is clear that an error is been displayed stating “main”java.lang.ClassNotFoundException:

To avoid this kind of an error, include the main class name in the execution command for MapReduce jar.

From the figure above, it is clear that no exception is displayed after including the main class namein the above command.

After the successful execution of the MapReduce program, we can check the output files in the output directory wc_output_4.

We now know how to troubleshoot the error that occurs if a programmer explicitly tries to excludemain class name while exporting the jar.

6. Changing Method ‘map’ Name in the Program

This type of error occurs when the programmer changes the method ‘map’ name into a different name in the program. The method name ‘map’ is unique and it should not be changed to any other name when trying to write mapper code within that map method.

The solution to this kind of an error is to not change the mapper method name ‘map’ into any other name in the program.

From the figure above, it is clear that no exception is displayed when the mapper method name is set to ‘map’ in the program.

After the successful execution of the MapReduce program, we can check the output files in the output directory wc_output_8.

We have now learnt to troubleshoot error that occurs if a user changes the mapper method name‘map’ into other different name.

7.Changing Method ‘reduce’ Name in the Program

If a programmer changes the reducer method name ‘reduce’ in the program to any other name, the MapReduce framework executes only the mapper code and skips the reducer part, and only the mapper output will be stored in the output directory. The MapReduce framework explicitly assumes that the reducer method is not written in the program and then executes only the mapper map method code part.

From the figure above, it is clear that only the mapper output is stored in the output directory(wc_output_9). The Sort and Shuffle process is not implemented to the mapper map method output, since reducer reduce method name is changed to a different name as shown in red.

Hence, the user should never change the reducer method name ‘reduce’ into any other name in the program.

We have set the reducer method name as ‘reduce’ in the program and exported the jar wc_2.jar.

From the figure above, it is clear that no exception is displayed when the reducer method name is set to ‘reduce’ in the program.

After successful execution of the MapReduce program, we can check the output files in the output directory wc_output_9.

We have now learnt to troubleshoot the error that occurs when a user changes the reducer method name ‘reduce’ to any other name.

8.Required Packages Are Not Imported in the Program

When a programmer doesn’t import the required packages in the program and tries to run the MapReduce program jar file explicitly in the terminal, an error will be displayed.

To troubleshoot the error, import all the required packages in the program and then run the exported MapReduce jar file in the terminal.

Import the required packages for our program and export the same program into the jar.

From the figure above, it is clear that no exception is displayed since we have imported all the required packages for our class file.

After the successful execution of the MapReduce program, we can check the output files in the output directory wc_output_11.

We have now learnt to troubleshoot error that occurs if a user doesn’t import the required packages in the program and explicitly tries to run the MapReduce jar file in the terminal.

9.Input or Output Parameter Type Mismatch in the Program

This type of error occurs when the programmer mismatches the input or output parameter type in the program and tries to run the program.

For the above word count program, the input file is of text format and hence the input key data type will be of Longwritable or Intwritable and the input value data type will be of text type.

As shown in the above figure, we can see that a java IOException is displayed, mentioning “type mismatch in key from map,” which states that there is a type mismatch defined in the mapper code part.

To solve this problem, set the correct parameter types according to the file format and then the user can run the exported MapReduce jar file in the terminal.

Now, let’s save the input value type as Text in the program and export the jar.

From the figure above, it is clear that no exception is displayed since we have used the correctinput value type in the program.

After the successful execution of the MapReduce program, we can check the output files in the output directory wc_output_12.

We have now learnt to troubleshoot the error that occurs when a user doesn’t mention the correct parameter type in the program and tries to execute the MapReduce jar file in the terminal.

We hope this blog helped you in understanding common errors which may occur during execution of a java MapReduce program and how to troubleshoot them, we will be posting Hive and Pig error types and the procedure to handle those errors in the next blog of this series.

Keep visiting our site for more updates on BigData and other technologies.

Troubleshoot MapReduce Jobs in Hadoop – Part 1

Understanding Mapper Class in Hadoop

Understanding Reducer Class in Hadoop Map Reduce

Advertisements
Categories:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s