Search code examples
hadoophiveapache-pigtransform

Pig - Store a complex relation schema in a hive table


here is my deal today. Well, I have created a relation as result of a couple of transformations after have read the relation from hive. the thing is that I want to store the final relation after a couple of analysis back in Hive but I can't. Let see that in my code much clear.

The first String is when I LOAD from Hive and transform my result:

july = LOAD 'POC.july' USING org.apache.hive.hcatalog.pig.HCatLoader ;  
july_cl = FOREACH july GENERATE GetDay(ToDate(start_date)) as day:int,start_station,duration; jul_cl_fl = FILTER july_cl BY day==31; 
july_gr = GROUP jul_cl_fl BY (day,start_station); 
july_result = FOREACH july_gr { 
           total_dura = SUM(jul_cl_fl.duration); 
           avg_dura = AVG(jul_cl_fl.duration); 
           qty_trips = COUNT(jul_cl_fl); 
           GENERATE FLATTEN(group),total_dura,avg_dura,qty_trips;
 };

So, now when I try to store the relation july_result I can't because the schema has changed and I suppose that it's not compatible with Hive:

STORE july_result INTO 'poc.july_analysis' USING org.apache.hive.hcatalog.pig.HCatStorer ();

Even if I have tried to set a special scheme for the final relation I haven't figured it out.

july_result = FOREACH july_gr {
              total_dura = SUM(jul_cl_fl.duration);
              avg_dura = AVG(jul_cl_fl.duration);
              qty_trips = COUNT(jul_cl_fl);
              GENERATE FLATTEN(group) as (day:int),total_dura as (total_dura:int),avg_dura as (avg_dura:int),qty_trips as (qty_trips:int);
              };

Solution

  • After a research in hortonworks community, I got the solution about how to define an output format for a group relation in pig. My new code looks like:

    july_result = FOREACH july_gr {
                  total_dura = SUM(jul_cl_fl.duration);
                  avg_dura = AVG(jul_cl_fl.duration);
                  qty_trips = COUNT(jul_cl_fl);
                  GENERATE FLATTEN( group) AS (day, code_station),(int)total_dura as (total_dura:int),(float)avg_dura as (avg_dura:float),(int)qty_trips as (qty_trips:int);
                  };
    

    Thanks guys.