Search code examples
sqlgoogle-bigquerywindow-functionsstring-agg

How to concatenate N rows of a column based on a specific column value based on a few conditions in Google BigQuery?


TimeStamp USER ID string_col
1 100001 Here
2 100001 there
5 100001 Apple
1 200002 this is
2 200002 that is
3 200002 Apple
4 200002 some
5 200002 summer
6 200002 winter
8 200002 Apple
9 200002 Apple

That is my raw table & I want to use the word "Apple" as a trigger word to concatenate rows with difference in timestamp lesser 4 from the Apple's timestamp for every userID.

Below table is the output I'm looking for:

TimeStamp USER ID string_col Result
1 100001 Here null
2 100001 there null
5 100001 Apple there
1 200002 this is null
2 200002 that is null
3 200002 Apple this is that is
4 200002 some null
5 200002 summer null
6 200002 winter null
8 200002 Apple summer winter
9 200002 Apple winter

Solution

  • Try this code:

    SELECT
      t1."TimeStamp",
      t1."USER ID",
      t1."string_col",
      (
        SELECT
          STRING_AGG(t2."string_col", ' ')
        FROM
          raw_table AS t2
        WHERE
          t2."USER ID" = t1."USER ID"
          AND t2."TimeStamp" < t1."TimeStamp"
          AND t1."string_col" = 'Apple'
          AND t2."string_col" != 'Apple'
          AND t1."TimeStamp" - t2."TimeStamp" < 4
      ) AS Result
    FROM
      raw_table AS t1
    

    And the output is
    enter image description here