Search code examples
apache-pig

Pig: remove tuples in inner bag


This is what the data looks like:

A: {id: int, data: {ARRAY_ELEM:(score:float, flag:boolean)}}
12, {(1.35, True), (2.46, False)}
13, {(0.98, False)}
14, {(0.23, True), (0.95, True)}

I want to remove all the tuples that is flagged False, without flatten the data. Expected output:

12, {(1.35, True)}
13, {}
14, {(0.23, True), (0.95, True)}

Is there a way I can do that in Pig Latin? Thank you!!


Solution

  • Try nested foreach.

    A = LOAD 'input.txt' AS (id: int, data:bag{(score:float, flag:boolean)});
    B = FOREACH A {
        filtered_data = FILTER data by flag == true;
        GENERATE id, filtered_data;
    }
    store B into '$output';
    

    Note that your input file should not have any spaces for boolean values. (Meaning "{(1.35,[space]True), (2.46,[space]False)} " should be "{(1.35,True), (2.46,False)}" with no spaces after the comma. Otherwise, you'll need to load them as chararray.