Search code examples
sqlarrayshivecastinghiveql

hive create an array from string


In my data I have comma seperated strings. It would be much easier if these would be arrays, so I can easily match them with another array for example. However, I am not able to create an array from the string.

Example code:

create table tmp_array_string as
select '"abc", "def"' as tmp_string
      , array("abc", "def") as tmp_array

select a.*
     , array(tmp_string) as not_a_proper_array
     , size(tmp_array) as array_size
     , size(array(tmp_string) ) not_the_proper_array_size
from tmp_array_string a

Why is array(tmp_string not resulting in an array with 2 elements and is there a way to make this work?

Thanks a lot!


After leftjoins answer, here is the proof of the correctness. :-)

select a.*
     , array(tmp_string) as not_a_proper_array
     , split(regexp_replace(tmp_string,'"',''),',\\s*') as correct_array
     , size(tmp_array) as array_size
     , size(split(regexp_replace(tmp_string,'"',''),',\\s*')) as correct_size
     , size(array(tmp_string) ) not_the_proper_array_size
from tmp_array_string ater code here

correct size results in 2 :-)


Solution

  • Use split(str, ',') to get an array. See dosc: here

    select split('abc, def',',\\s*'); --Delimiter is comma+zero or more spaces
    

    Result:

    OK
    ["abc","def"]
    Time taken: 1.676 seconds, Fetched: 1 row(s)
    

    If your delimited string contains extra double quotes, then remove them using regex_replace:

     select split(regexp_replace('"abc", "def"','"',''),',\\s*');
    

    Result:

    OK
    ["abc","def"]
    Time taken: 3.735 seconds, Fetched: 1 row(s)