I was trying my first UDF in pig and wrote the following function -
package com.pig.in.action.assignments.udf;
import org.apache.pig.EvalFunc;
import org.apache.pig.PigWarning;
import org.apache.pig.data.Tuple;
import java.io.IOException;
public class CountLength extends EvalFunc<Integer> {
public Integer exec(Tuple inputVal) throws IOException {
// Validate Input Value ...
if (inputVal == null ||
inputVal.size() == 0 ||
inputVal.get(0) == null) {
// Emit warning text for user, and skip this iteration
super.warn("Inappropriate parameter, Skipping ...",
PigWarning.SKIP_UDF_CALL_FOR_NULL);
return null;
}
// Count # of characters in this string ...
final String inputString = (String) inputVal.get(0);
return inputString.length();
}
}
However, when I try to use it as follows, Pig throws an error message that it not easy to understand atleast for me in the context of my UDF :
grunt> cat dept.txt;
10,ACCOUNTING,NEW YORK
20,RESEARCH,DALLAS
30,SALES,CHICAGO
40,OPERATIONS,BOSTON
grunt> dept = LOAD '/user/sgn/dept.txt' USING PigStorage(',') AS (dept_no: INT, d_name: CHARARRAY, d_loc: CHARARRAY);
grunt> d = FOREACH dept GENERATE dept_no, com.pig.in.action.assignments.udf.CountLength(d_name);
2015-06-02 16:24:13,416 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 2, column 79> mismatched input '(' expecting SEMI_COLON
Details at logfile: /home/sgn/pig_1433261973141.log
Can anyone help me figuring out whats wrong with this ?
I have gone through the documentation, but nothing seems obvious to me that is wrong in the sample above. Am I missing something here ?
These are the libraries I am using in pom.xml :
<dependency>
<groupId>org.apache.pig</groupId>
<artifactId>pig</artifactId>
<version>0.14.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
Is there any compatibility problem ?
Thanks,
-Vipul Pathak;
Found the reason of the problem after about 36 hours of downtime ...
The package name contains "IN" which somehow was the problem to Pig.
package com.pig.in.action.assignments.udf;
// ^^
When I changed the package name to the following, everything was good -
package com.pig.nnn.action.assignments.udf;
// ^^^
After building my modified UDF, I registered the Jar and Defined an alias for the function name and bingo, everything worked -
REGISTER /user/sgn/UDFs/Pig/CountLength-1.jar;
DEFINE CL com.pig.nnn.action.assignments.udf.CountLength;
. . .
. . .
d = FOREACH dept GENERATE dept_no, CL(d_name) AS DeptLength;
I don't recall if IN is a reserve word in Pig. But still presence of IN causes problem, (atleast in version 0.14.0 of Pig).