Search code examples
regexhadoophivehiveql

Hive : Expand a single record into multiple records when column values are separated by delimiter (~)


I have the following information

id     user                 date                          test

102   123~456~897   01JAN2019~6JUL2018~21DEC2017        abc~qwer~~wer~
103  13~45~88~34   01JAN2020~6JUN2019~21JAN2018~12MAR2017 pqr~~qw~wer*123~qwerty

The result I need, the records must be expanded based on ascending date.

id      user     date            test 

102     897      21DEC2017        wer
102     456      06JUL2018        qwer
102     123      01JAN2019        abc
103     34       12MAR2017        qwerty
103     88       21JAN2018        wer*123
103     45       06JUN2019        qw
103     13       01JAN2020        pqr

I tried with lateral View posexplode, but it's not working properly, please help.

a.id, 
    from demo a
    lateral view posexplode(a.test,'\\~')) t1 as exploded_test,test
    lateral view posexplode(a.user,'\\~')) t2 as exploded_user, user
    lateral view posexplode(a.date,'\\~')) t3 as exploded_date, date
    where exploded_test == exploded_user and exploded_user == exploded_date
    order by 1,3

Solution

  • Tilde (~) is not a metacharacter in Java regex (Hive uses Java regex) and does not need escaping. Also posexplode or explode can be applied to array, you need to split string to get an array before exploding. Use posexplode(split(a.user,'~+')) t1 as (pos,user)

    Also (this is just an opinion): I would not name the array position as "exploded_test". pos or position or simply p would look less confusing