PIG -UDF [USER DEFINED FUNCTION]

Here, we have taken one example and we have used both the UDF (user defined functions) i.e. making a string in upper case and taking a value & raising its power.

The dataset is depicted below which we are going to use in this example:

table

Our aim is to make 1st column letter in upper case and raising the power of the 2nd column with the value of 3rd column.

Let’s start with writing the java code for each UDF. Also we have to configure 4 JARs in our java project to avoid the compilation errors.
First, we will create java programs, both are given below:

Upper.java

import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.WrappedIOException;

@SuppressWarnings("deprecation")
public class Upper extends EvalFunc<String> {

public String exec(Tuple input) throws IOException {

if (input == null || input.size() == 0)

return null;

try {

String str = (String) input.get(0);

str=str.toUpperCase();

return str;

}
catch (Exception e) {

throw WrappedIOException.wrap("Caught exception processing input row ", e);

}
}
}

Power.java

import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.PigWarning;
import org.apache.pig.data.Tuple;

public class Pow extends EvalFunc<Long> {

public Long exec(Tuple input) throws IOException {
try {

int base = (Integer)input.get(0);
int exponent = (Integer)input.get(1);
long result = 1;

/* Probably not the most efficient method...*/
for (int i = 0; i < exponent; i++) {
long preresult = result;
result *= base;
if (preresult > result) {
// We overflowed. Give a warning, but do not throw an
// exception.
warn("Overflow!", PigWarning.TOO_LARGE_FOR_INT);
// Returning null will indicate to Pig that we failed but
// we want to continue execution.
return null;
}
}
return result;
} catch (Exception e) {
// Throwing an exception will cause the task to fail.
throw new IOException("Something bad happened!", e);
}
}
}

To remove compilation errors, we have to configure 4 JARs in our java project.

 

1

Download JARs

Now, we export JAR files for both the java codes. Please check the below steps for JAR creation.

Here, we have shown for one program, proceed in the same way in the next program as well.

2

 

 

 

 

 

3

4

5

After creating the JARs and text files, we have moved all the data to HDFS cluster, which is depicted by the following images:

 

a1

In our dataset, fields are comma (,) separated.

13

 

After moving the file, we have created script with .pig extension and put all the commands in that script file.

14

 

Now in terminal, type PIG followed by the name of the script file which is shown in the Above image:

 

Here, this is the output for running the pig script.

15

Got a question for us? Please mention them in the comments section and we will get back to you.

Steps to Create UDF in Apache Pig

Pig Script in Local Mode

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s