Search code examples
apache-pig

pig filter and getting original dataset


I have a pig input file which looks like this:

1, cornflakes, Regular, Post, 10
2, cornflakes, Regular,General Mills, 12
3, cornflakes, Mixed Nuts, Post, 14
4, chocolate syrup, Regular, Hersheys, 5
5, chocolate syrup, No High Fructose, Hersheys, 8
6, chocolate syrup, Regular, Ghirardeli, 6
7, chocolate syrup, Strawberry Flavor, Ghirardeli, 7

I need to filter out the cornflakes which is less than 12 and I need to use the original set of data for the next step of filtering.

total = LOAD 'location_of_file' using PigStorage('\t') as (item_sl : int, item : chararray, type: chararray, manufacturer: chararray, price : int);
filter1 = FILTER total BY item == 'cornflakes' AND price < 12;

Now I need to use the original dataset after filter1 for the next step of filtering.


Solution

  • Use SPLIT

    total = LOAD '/output/systemhawk/file_inventory/test34.txt' USING PigStorage(',') AS (item_sl : int, item : chararray, type: chararray, manufacturer: chararray, price : int);
    SPLIT total INTO filter1 IF (item == 'cornflakes' AND price < 12),filter2 OTHERWISE;
    DUMP filter2;
    

    Output