Search code examples
hadoopnullapache-pignegate

How Pig deals with negating null value?


I have a problem not understanding how apache pig (version r0.9.2) is handling negation of null values. I have an expression like this:

nonEmpty = FILTER dataFields BY NOT IsEmpty(children);

If children is null, IsEmpty function will return null - so what confuses me how NOT operator will behave since I would have expression like this:

nonEmpty = FILTER dataFields BY NOT NULL;

Documentation for pig latin r0.9.2 says next: "Pig does not support a boolean data type. However, the result of a boolean expression (an expression that includes boolean and comparison operators) is always of type boolean (true or false)." which doesn't do anything more than confuse me totally.

Thanks for the help in advance.


Solution

  • Testing a NULL for emptiness is probably not a good idea regardless. In fact, I tried it on 0.10.0, and it threw an error saying exactly that. Instead, filter by not null and not empty:

    nonEmpty = FILTER dataFields BY (children IS NOT NULL) AND (NOT IsEmpty(children));