Search code examples
sqlregexhadoophiveimpala

Select the next non empty field in hive


I have a table with 6 fields that looks like this:

Field1 Field2 Field3 Field4 Field5 Field6
ABC    45     XYZ           JKL    BNM
       65            QWE    JKL    
WER           YUI    IOP    GHJ

I want to pull the data from that table above to a new table with 5 fields where we ignore the empty values. My final table should look like this:

Result1 Result2 Result3 Result4 Result5
ABC     45      XYZ     JKL     BNM
65      QWE     JKL
WER     YUI     IOP     GHJ

I have started to write a massive conditional query with CASE WHEN but it's getting out of hand and prone to errors. Is it possible to get that table using a regex_extract query in Hive?


Solution

  • Assuming "empty values" are Nulls

    select  fields[0]   as Field1
           ,fields[1]   as Field2
           ,fields[2]   as Field3
           ,fields[3]   as Field4
           ,fields[4]   as Field5
    
    from   (select  split(concat_ws(string(unhex(1)),*),'\\x01') as fields
            from    mytable
            ) t
    

    +--------+--------+--------+--------+--------+
    | field1 | field2 | field3 | field4 | field5 |
    +--------+--------+--------+--------+--------+
    | ABC    | 45     | XYZ    | JKL    | BNM    |
    | 65     | QWE    | JKL    | (null) | (null) |
    | WER    | YUI    | IOP    | GHJ    | (null) |
    +--------+--------+--------+--------+--------+
    

    Simplified version, assuming comma (,) does not appear in your fields:

    select  ...
    
    from   (select  split(concat_ws(',',*),',') as fields
            from    mytable
            ) t
    

    Assuming "empty values" are empty strings

    select  fields[0]   as Field1
           ,fields[1]   as Field2
           ,fields[2]   as Field3
           ,fields[3]   as Field4
           ,fields[4]   as Field5
    
    from   (select  split(regexp_replace(concat_ws(string(unhex(1)),*),'^\\x01+|\\x01+$|(\\x01)+','$1'),'\\x01') as fields
            from    mytable
            ) t
    

    +--------+--------+--------+--------+--------+
    | field1 | field2 | field3 | field4 | field5 |
    +--------+--------+--------+--------+--------+
    | ABC    | 45     | XYZ    | JKL    | BNM    |
    | 65     | QWE    | JKL    | (null) | (null) |
    | WER    | YUI    | IOP    | GHJ    | (null) |
    +--------+--------+--------+--------+--------+
    

    Simplified version, assuming comma (,) does not appear in your fields:

    select  ...
    
    from   (select  split(regexp_replace(concat_ws(',',*),'^,+|,+$|(,)+','$1'),',') as fields
            from    mytable
            ) t