Search code examples
apache-pig

PIG REPLACE with NULL


I have three values A, B and C.

I want to be able to replace the value of C with a NULL value if A AND B have values in their cells.

Unsure where to go. I've tried something like

FOR EACH X GENERATE REPLACE(C, ((A IS NOT NULL AND B IS NOT NULL) ? NULL:C) ;

But unsure if this will work, it doesn't seem right. I don't want to add any more values, just update the value of C?

Maybe something like

FOR EACH X GENERATE (A IS NOT NULL AND B IS NOT NULL) ? NULL:C AS NEW_C;

Then drop C, whilst retaining A, B and NEW_C?


Solution

  • You can simply do:

    Y = FOREACH X GENERATE A, B, (A IS NOT NULL AND B IS NOT NULL ? NULL : C) AS C;
    

    There is no need to create NEW_C and then drop C since no fields are carried into the new relation unless you explicitly name them (unless you use GENERATE * so that all fields are carried through).