Search code examples
hivehiveqlhuehive-query

Separating a column data into multiple columns in hive


I have a sample data for a device which contains two controller and it's version. The sample data is as follows:

device_id   controller_id  versions
123          1             0.1
123          2             0.15
456          2             0.25
143          1             0.35
143          2             0.36

This above data should be in the below format:

device_id   1st_ctrl_id_ver   2nd_ctrl_id_ver
123          0.1              0.15
456          NULL             0.25
143          0.35             0.36

I used the below code which is not working:

select
device_id,
case when controller_id="1" then versions end as 1st_ctrl_id_ver,
case when controller_id="2" then versions end as 2nd_ctrl_id_ver       
from device_versions

The ouput which i got is:

device_id   1st_ctrl_id_ver   2nd_ctrl_id_ver
123          0.1              NULL
123          NULL             0.15
456          NULL             0.25
143          0.35             NULL
143          NULL             0.36

I don't want the Null values in each row.Can someone help me in writing the correct code?


Solution

  • To "fold" all lines with a given key to a single line, you have to run an aggregation. Even if you don't really aggregate values in practise.

    Something like
    select device_id,
    MAX(case when controller_id="1" then versions end) as 1st_ctrl_id_ver,
    MAX(case when controller_id="2" then versions end) as 2nd_ctrl_id_ver
    from device_versions
    GROUP BY device_id

    But be aware that this code will work if and only if you have at most one entry per controller per device, and any controller with a version higher than 2 will be ignored. In other words it is rather brittle (but you can't do better in SQL anway)