Search code examples
hadoopapache-pig

How to get the value for a variable key from a pig map?


Is there a way we can get the value of a map for variable keys using the field as the key? Eg : My company data has locale and name fields like this

 {"en_US", (["en_US" : "English Name"], ["fr_FR" : "French Name"])}

What I want essentially is to get the value of the map using locale as the key as it will be different for different locales.

company_data = load '/data' using PigStorage();

final_company_data = FOREACH company_data GENERATE
                                             value.locale as locale
                                             value.name#locale;

The following gives me an error coz I understand that to retrieve a value from the map we need value.name#'en_US'. Is there a way we can use the locale so it gets substituted for the right value?

Output : final_company_data = {"en_US", "English Name"}

Solution

  • As far as I remember you can't do that in Pig. The key has to be static value. So eg this should work:

    final_company_data = FOREACH company_data GENERATE
                                             value.locale as locale
                                             value.name#'en_US';
    

    If the key set size is not too big you can try something like this (but this includes a lot of typing):

    en = FILTER company_data BY value.locale == 'en_US';
    final_company_data_en = FOREACH company_data GENERATE
                                             value.locale as locale
                                             value.name#'en_US';
    fr = FILTER company_data BY value.locale == 'fr_FR';
    final_company_data_en = FOREACH company_data GENERATE
                                             value.locale as locale
                                             value.name#'fr_FR';
    

    and do this for every key and then do the union of all subsets. This solution is poor and ugly but it works.