Search code examples
sqlarraysstringgoogle-bigqueryconcatenation

Big Query SQL concatenating or linking values in piped strings or arrays of strings


I have two fields in a big BQ table. Both fields are strings. Both strings are formatted to represent several values separated by pipes, the same number of values in each string. I need to associate each value in the first set with each value in the second set. For example:

id   names     nums

x    "a|b|c"   "3|9|5"
y    "d"       "1"
z    "e|f"     "4|7"

I need to come out with a result like:

x,a,3
x,b,9
x,c,5
y,d,1
z,e,4
z,f,7

The second input string is a sequence of numbers but I don't mind if this comes out as numeric or string (I'll figure out the casting).
Seems obvious I need to use split() at some point can convert the strings to arrays, but how can I concatenate the arrays from left to right rather than lenght-wise?
I know I can use a double unnest with an index and select only where the indexes are equal, however I don't want to use this method because it makes the already large input table massive (before selecting the equal indexes).
Thanks for any thoughts!


Solution

  • Here is one method that generates a series of subscripts and uses arrays to extract the values:

    select id, split(names, '|')[safe_ordinal(n)] as names, 
           split(nums, '|')[safe_ordinal(n)] as nums
    from (select 'x' as id, 'a|b|c' as names, '3|9|5' as nums union all
          select 'y', 'd', '1' union all
          select 'z', 'e|f', '4|7'
         ) t cross join
         unnest(generate_array(1, array_length(split(names, '|')))) as n;
    

    This uses the length of names to determine how many values there are.