Search code examples
sqlarrayshivehiveql

Normalising Lateral Explode in Hive


I have a table which has array list in a field. I am doing lateral explode on this table to get the elements. But doing this, the values are also getting multiplied.

Table:

Sitedomain      Keyword                             Clicks

msn.com         sports,cricket,accessories           100
yahoo.com       fashion,accessories                   50

Once performing a lateral explode, my output is something like

 Sitedomain     Keyword       Clicks

 msn.com        sports        100
 msn.com        cricket       100
 msn.com        accessories   100
 yahoo.com      fashion        50
 yahoo.com      accessories    50

As you can see, the metrices are also getting exploded. Is there anyway to normalize this data so that the metrics is divided by the number of elements in the array? So the output looks like

 Sitedomain     Keyword       Clicks

  msn.com        sports        33.3
  msn.com        cricket       33.3
  msn.com        accessories   33.3
  yahoo.com      fashion        25
  yahoo.com      accessories    25

Solution

  • Divide clicks by keyword array size:

    with your_table as(
    select stack(2,
    'msn.com',         'sports,cricket,accessories',           100,
    'yahoo.com',       'fashion,accessories',                   50
    ) as (Sitedomain,Keyword,Clicks)
    )
    
    select Sitedomain,k.Keyword,round(s.Clicks/size(Keyword_aray),1) as Clicks
    from
    (
    select Sitedomain,
           split(Keyword,',')  Keyword_aray, 
           Clicks
      from your_table
    )s  lateral view explode(Keyword_aray) k as keyword
    ;
    

    Returns:

    msn.com         sports          33.3
    msn.com         cricket         33.3
    msn.com         accessories     33.3
    yahoo.com       fashion         25.0
    yahoo.com       accessories     25.0
    

    I added round() to get precision like in your example, remove it if not necessary.