Search code examples
google-bigquerystring-aggregation

How can I write a conditional string_agg function in BigQuery?


I've written a query that looks at user journeys through my site, using the string_agg() function and hits.contentGroup.contentGroup1 to group together product pages, advice pages, news pages and so on in sequential order, ordering them using the hit number and only looking at 'PAGE' hit types.

However, sign up pages aren't grouped in the same way, so this results in '(not set)' values appearing throughout the user journey in the query's output table, if the user has sign up for an account.

Is there a way to apply conditions to the string_agg function, so that if the content group is 'not set', it can look at the corresponding page path, and if the page path matches LIKE '%join/complete%' for example, return 'Join Complete' or a result other than 'not set'?

This is my best attempt, but it only returns the results that contain join/complete, and not the rest of the user journey.

    SELECT
      date,
      fullVisitorId,
      visitId,
      STRING_AGG(
      CASE 
        WHEN hit.contentGroup.contentGroup1 = '(not set)' THEN 
        CASE 
          WHEN hit.page.pagePath LIKE '%join/complete%' THEN 'join_complete' 
          ELSE hit.contentGroup.contentGroup1 
        END 
      END
      , '#' ORDER BY hit.hitNumber)
    FROM `xxxxx-ga-datasets.xxxx.ga_sessions_*` AS t, 
    UNNEST(hits) AS hit
    WHERE
      _TABLE_SUFFIX BETWEEN '20190101' AND '20190101'
      AND hit.type='PAGE'
    GROUP BY 1, 2, 3
    ORDER BY 1, 2

Any advice appreciated! Cheers.


Solution

  • Because your case-when construct returns NULL if the content group is not not set. You can simplify it to

        SELECT
          date,
          fullVisitorId,
          visitId,
          STRING_AGG(
          CASE 
            WHEN hit.page.pagePath LIKE '%join/complete%' THEN 'join_complete' 
            ELSE hit.contentGroup.contentGroup1  
          END
          , '#' ORDER BY hit.hitNumber)
        FROM `xxxxx-ga-datasets.xxxx.ga_sessions_*` AS t, 
        UNNEST(hits) AS hit
        WHERE
          _TABLE_SUFFIX BETWEEN '20190101' AND '20190101'
          AND hit.type='PAGE'
        GROUP BY 1, 2, 3
        ORDER BY 1, 2