I have this code.
large = load 'a super large file'
CC = FILTER large BY $19 == 'abc OR $20 == 'abc'
OR $19 == 'def' or $20 == 'def' ....;
The number of OR conditions can go up to 100s or even thousands.
Is there a better way to do this ?
Yes,put those conditions in another file.Load it into a relation and join the two relations on the column.If you have to filter on multiple columns then create as many filter files as the conditions.Below is an example for 2 columns
large = load 'a super large file'
filter1 = load 'file with values needed to compare with $19';
filter2 = load 'file with values needed to compare with $20';
f1 = JOIN large BY $19,filter1 BY $0;
f2 = JOIN large BY $20,filter2 BY $0;
final = UNION f1,f2;
DUMP final;
You can probably use 1 filter file with multiple columns and join on those to get different filtered results and then just union the relations.
large = load 'a super large file'
filter_file = load 'file with values in different columns';
f1 = JOIN large BY $19,filter_file BY $0;
f2 = JOIN large BY $20,filter_file BY $1;
final = UNION f1,f2;
DUMP final;