Search code examples
sqlarraysgoogle-bigqueryconcatenationtranspose

bigquery transpose and concatenate for each record


I want to achieve the following transformation. I have last_name stored in a repeated record as follows.

data before transformation

I want to achieve the following. data after transformation

Example with sample data created.

create or replace table t1.cte1 as
WITH t1 AS (
  SELECT 1 as id,'eren' AS last_name UNION ALL
  SELECT 1 as id,'yilmaz' AS last_name UNION ALL
  SELECT 1 as id,'kaya' AS last_name
)
SELECT id,ARRAY_AGG(STRUCT(last_name)) AS last_name_rec
FROM t1
GROUP BY id;

with test as (
select x.id, x.lname_agg,y.last_name  from
(
select id, STRING_AGG(h.last_name,' ') lname_agg FROM
  t1.cte1
  LEFT JOIN
  UNNEST(last_name_rec) AS h
  group by id
  ) x,
  (select id,h.last_name last_name  FROM
  t1.cte1
  LEFT JOIN
  UNNEST(last_name_rec) AS h
  group by last_name,id) y
) select id ,sp.string_flatten_dedup( lname_agg,' ') concat_last_name, last_name from test;

I'm not sure either if I should store it as an array instead of a concatenated field but it would be good to know how to achieve both. storing the concat_last_name as an array

I have achieved the first transformation as follows but I had to dedup the concatenated field with a function I wrote. I'm sure there is a much better way of achieving this.

       with test as (
select x.id id, x.lname_agg,y.last_name  from 
(
select id, STRING_AGG(h.last_name,' ') lname_agg FROM
  small_test
  LEFT JOIN
  UNNEST(last_name_rec) AS h
  group by id
  ) x,
  (select id,h.last_name last_name  FROM
  small_test
  LEFT JOIN
  UNNEST(last_name_rec) AS h group by last_name,id) y
) select id ,sp.string_flatten_dedup( lname_agg,' ') concat_last_name, last_name from test;

The function. string_flatten_dedup

CREATE OR REPLACE FUNCTION
sp.string_flatten_dedup(string_value string,
    delim string) AS
(
                            ARRAY_TO_STRING
                                (ARRAY(SELECT distinct string_value
                                       FROM UNNEST(SPLIT(string_value, delim)) AS string_value
                                       order by string_value desc, string_value),
                                 delim)
);

before using function. intermediate results.

Final result after applying dedup function. final output

Updated table structure. t1.ccte1

Yours works but I got the table structure incorrect when I first posted.

create or replace table t1.cte2 as
with your_table as (
  select 1 id, ['brown', 'smith', 'jones'] last_name union all 
  select 2, ['ryan', 'murphy']
) select id, ln as last_name,
   array_to_string(last_name, ',') as concat_last_name,
from your_table, unnest(last_name) ln;

select id, ln as last_name,
   array_to_string(last_name, ',') as concat_last_name,
from t1.cte2, unnest(last_name) ln;

--fails as its not the structure I thought it was cte1 is different then cte2
select id, ln.last_name
   --array_to_string(last_name, ',') as concat_last_name,
from t1.cte1, unnest(last_name_rec) ln;

Solution

  • Consider below approach

    select id, ln as last_name, 
       array_to_string(last_name, ',') as concat_last_name,
    from your_table, unnest(last_name) ln
    

    if applied to sample data in your question data before transformation

    with your_table as (
      select 1 id, ['brown', 'smith', 'jones'] last_name union all 
      select 2, ['ryan', 'murphy']
    )
    

    output is

    enter image description here

    In case if you want last names as an array - you already have this array - see below for how to use it

    select id, ln as last_name, 
      last_name as concat_last_name,
    from your_table, unnest(last_name) ln
    

    with output

    enter image description here