Search code examples
apache-pig

Handling Null DataType


I'm using the Over function from Piggybank to get the Lag of a row

res= foreach (group table by fieldA) {
Aord = order table by fieldB;
generate flatten(Stitch(Aord, Over(Aord.fieldB, 'lag'))) as (fieldA,fieldB,lag_fieldB) ;}

This works correctly and when I do a dump I get the expected result, the problem is when I want to use lag_fieldB for any comparison or transformation I get datatype issues. If I do a describe it returns fieldA: long,fieldB: chararray,lag_fieldB: NULL

I'm new with PIG but I already tried casting to chararray and using ToString() and I keep getting errors like these:

ERROR 1052: Cannot cast bytearray to chararray

ERROR 1051: Cannot cast to bytearray

Thanks for your help


Solution

  • Ok after some looking around into the code of the Over function I found that you can instantiate the Over class to set the return type. What worked for me was:

    DEFINE ChOver org.apache.pig.piggybank.evaluation.Over('chararray');
    res= foreach (group table by fieldA) {
    Aord = order table by fieldB;
    generate flatten(Stitch(Aord, ChOver(Aord.fieldB, 'lag'))) as (fieldA,fieldB,lag_fieldB) ;}
    

    Now the describe is telling me

    fieldA: long,fieldB: chararray,lag_fieldB: chararray
    

    And I'm able to use the columns as expected, hope this can save some time for someone else.