Search code examples
hadoopamazon-web-servicesapache-pigemr

In PIG how to project disambiguited field present in bag?


I have something like this :

  joined = JOIN A BY F1, B BY F1 ;
  joinOutput = FOREACH joined GENERATE A::f3 AS f3, A::f4 AS f4, B::f5 AS f5 ;
  grouped = GROUP joinOutput BY f3 ;
  countOutput = FOREACH grouped FLATTEN(joinOutput) , count(f5) as COUNT ;

if I do """ DESCRIBE countOutput """ then I get following:

 countOutput = { joinOutput::f3 :chararray, joinOutput::f4 :int, COUNT :int }

Now if I try to reference f3 with respect to "countOutput" i.e. countOutput.f3 I get error saying invalid field projection.

So my question is how do I project field f3 with respect to countOutput.

I haven't tried this is yet if this is correct but I could think of following ways -

 countOutput.joinOutput::f3    

Not sure though if this is correct way.

Any help is appreciated.


Solution

  • ok, found solution after trying out few things. I found that you can specify schema explicitly when you FLATTEN.

    So this particular step can be re-written as follows :

     countOutput = FOREACH grouped FLATTEN(joinOutput) AS ( f3 :chararray, f4: int) , count(f5) as COUNT ;
    

    Now I can directly reference flattened fields with respect to outer relation. Hope this helps if someone runs into same problem.