Search code examples
hiveapache-pig

Converting apache pig to hive


Trying to figure out "group" flatten and what this particular "flatten" code is doing. I have been working on the code below trying to figure out how to convert it to hive for a few days off and on, and I just don't get it. Normally, they use flatten to create multiple rows for two or more columns that they want named the same in the output. But in this case, I'm not sure what it's doing to replicate it in hive. Any assistance would be greatly appreciated as I don't have much time to work on this while I'm expected to complete and test it in the next couple of weeks. Thanks.

Change_pop = GROUP IPChange_pop BY (acct_num,strategy_code);
Oldest_GLChange = FOREACH Change_pop {
OList = ORDER IPChange_pop BY process_date ASC, new_loc DESC;
Oldest = LIMIT OList 1;
GENERATE
FLATTEN(GLChange_pop) as (email,acct_num,acct_nm,cust_num,type,strategy_code,process_date,last_5,cmGroup,current_loc,new_loc,update_ts),
FLATTEN(group.strategy_code) as grp_strategy_code,
FLATTEN(Oldest.process_date) as early_process_date, FLATTEN(Oldest.new_loc) as early_new_loc;
};

Solution

  • Flatten is being used to un-nest tuples, bags, and maps. From the top of my head, I recall Hive equivalent would be using EXPLODE() function along with LATERAL VIEW.

    https://pig.apache.org/docs/latest/basic.html#flatten

    https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode