Search code examples
sqlhadoopapache-sparkhiveapache-pig

how to make row data to source and target zigzag using hive or pig


Input

id,name,time
1,home,10:20
1,product,10:21
1,mobile,10:22
2,id,10:24
2,bag,10:30
2,home,10:21
3,keyboard,10:32
3,home,10:33
3,welcome,10:36

I want to make name column as source and target based on the below output. Earlier I tried with pig

The steps are:

a=load-->b=asc->c=dec -> then join the data

I got the output like this

(1,home,10:20,1,product,10:21)
(2,bag,10:30,2,id,10:24)
(3,home,10:32,3,welcome,10:36)

output

1,home,product
1,product,mobile
2,id,bag
2,bag,home
3,keyboard,home
3,home,welcome

Solution

  • In Hive (and in Spark), you can use Window function LEAD :

    with t as 
       ( select id, name, lead(name)  over (partition by id) as zigzag from table) 
    select * from t where t.zigzag is not null 
    

    Should give you the output :

    +---+--------+-------+
    | id|    name| zigzag|
    +---+--------+-------+
    |  1|    home|product|
    |  1| product| mobile|
    |  2|     bag|   home|
    |  2|    home|     id|
    |  3|keyboard|   home|
    |  3|    home|welcome|
    +---+--------+-------+