Search code examples
javahadoopapache-piguser-defined-functionsudf

Java UDF on Hadoop input parameter -- call from Pig on Hadoop


If I have the following data structure (a relation) in Pig and I want to pass it to a Java UDF, wondering what should be the related Java data type of the input parameter?

(student relation is a bag, schema is ID as int, a tuple contains an interest bag and a classes bag).

student: {id: int,(interest: {(value: chararray)},classes: {(value: chararray)})}

thanks in advance, Lin


Solution

  • I think it can be done as shown below.

     public class BagUdf extends EvalFunc<DataBag> {
    
    public <returnType> exec(Tuple input) throws IOException {
    //iterate over the bag elements
    for (Tuple t : (DataBag)input.get(0)) {
         // process tuple t
    }
    return returnVal;
    }
    

    Please refer to this link