Search code examples
hadoopapache-pighcatalog

Merge two bag and get all the field from first bag in pig


I am new to PIG scripting. need some help on this issue.

I got two set of bag in pig and from there I want to get all the field from first bag and overwrite data of first bag if second bag has the data of same field

Column list are dynamic (columns may get added or deleted any time). in set b we may get data in another field also which are currently blank, if so, then we need to overwrite set a with data available in set b

columns - uniqueid,catagory,b,c,d,e,f,region,g,h,date,direction,indicator

EG:

all_data= COGROUP a by (uniqueid), b by (uniqueid);

Output:

(1,{(1,test,,,,,,,,city,,,,,2020-06-08T18:31:09.000Z,west,,,,,,,,,,,,,A)},{(1,,,,,,,,,,,,,,2020-09-08T19:31:09.000Z,,,,,,,,,,,,,,N)})
    
(2,{(2,test2,,,,,,,,dist,,,,,2020-08-02T13:06:16.000Z,east,,,,,,,,,,,,A)},{(2,,,,,,,,,,,,,,2020-09-08T18:31:09.000Z,,,,,,,,,,,,,,N)})

Expected Result:

(1,test,,,,,,,,city,,,,,2020-09-08T19:31:09.000Z,west,,,,,,,,,,,,,N)
(2,test2,,,,,,,,dist,,,,,2020-09-08T18:31:09.000Z,east,,,,,,,,,,,,N)

Solution

  • I was able to achieve expected output with below

    final = FOREACH all_data GENERATE flatten($1),flatten($2.(region)) as region ,flatten($2.(indicator)) as indicator;