Search code examples
javanullpointerexceptionapache-pigudfpig-udf

Pig UDF Throwing NullPointerException When Generating New Tuple


I have a Pig UDF which ingests some data and then attempts to transform that data in a minimal manner.

my_data = LOAD 'path/to/data' USING SomeCustomLoader();
my_other_data = FOREACH my_data GENERATE MyUDF(COL_1, COL_2, $param1, $param2) as output;
my_final_data = FOREACH my_other_data GENERATE output.NEW_COL1, output.NEW_COL2, output.NEW_COL3;

However, I keep getting the following error:

ERROR 0: Exception while executing [POUserFunc (Name: POUserFUnc(udf.MyUDF)[tuple] - scope-38 Operator Key: scope-38) children: null at []]: java.lang.NullPointerException

My UDF takes the data and transforms it:

public class MyUDF extends EvalFunc<Tuple> {
    public Tuple exec(Tuple input) throws IOException {
        if (input == null || input.size() == 0)
            return null;

        TupleFactory _factory;

        Long fieldOne;
        String fieldTwo;
        String fieldThree;

        _factory.getInstance();

        try {
            fieldOne = Long.valueOf(input.get(0).toString());
            fieldTwo = input.get(1).toString();
            fieldThree = input.get(2).toString();

            fieldOne = doSomething(fieldOne);
            fieldTwo = doSomething(fieldTwo);
            fieldThree = doSomething(fieldThree);

            return _factory.newTuple(Arrays.asList(fieldOne, fieldTwo, fieldThree));

        } catch (Exception ex) {
            return _factory.newTuple(Arrays.asList("ParseException", "", "", ""));
        }
    }
}

I have debugged and confirmed that fieldOne, fieldTwo, and fieldThree do exist prior to calling the tuple factory. It's also clear that the exception is being thrown because the code reaches the catch block and then throws this NullPointerException error.

What is not clear is why on earth this is happening.

According to the Pig docs (Pig 0.14.0 API), I should be able to call newTuple(java.util.List c) with the relevant items.

I have also defined my own Schema to ensure the types are correct when going back to the pig script.


Solution

  • The code in question has not instantiated your tuple instance, thus you cannot call the method on an object that does not exist.

    public class ... {
        TupleFactory _factory;
        public Tuple exec(Tuple input) {
            _factory = TupleFactory.getInstance();
            ...
        }
    }