This is what the data looks like:
A: {id: int, data: {ARRAY_ELEM:(score:float, flag:boolean)}}
12, {(1.35, True), (2.46, False)}
13, {(0.98, False)}
14, {(0.23, True), (0.95, True)}
I want to remove all the tuples that is flagged False, without flatten the data. Expected output:
12, {(1.35, True)}
13, {}
14, {(0.23, True), (0.95, True)}
Is there a way I can do that in Pig Latin? Thank you!!
Try nested foreach.
A = LOAD 'input.txt' AS (id: int, data:bag{(score:float, flag:boolean)});
B = FOREACH A {
filtered_data = FILTER data by flag == true;
GENERATE id, filtered_data;
}
store B into '$output';
Note that your input file should not have any spaces for boolean values. (Meaning "{(1.35,[space]True), (2.46,[space]False)} " should be "{(1.35,True), (2.46,False)}" with no spaces after the comma. Otherwise, you'll need to load them as chararray.