Search code examples
sqlgoogle-bigquerywindow-functions

BigQuery window function to match previous record


I'm looking for a window function to retrieve a value in a previous record in BigQuery by comparing a value in the current record until it finds the most recent match. Here is a sample table:

enter image description here

with a as (
  select 1 as hitNumber, 'xx' as sessionID, 'aaa' as pageName, null as convPageName
  union all
  select 2 as hitNumber, 'xx' as sessionID, 'ccc' as pageName, 'bbb' as convPageName
  union all
  select 3 as hitNumber, 'xx' as sessionID, 'ccc' as pageName, null as convPageName
  union all
  select 4 as hitNumber, 'xx' as sessionID, 'ddd' as pageName, 'qqq' as convPageName
  union all
  select 5 as hitNumber, 'xx' as sessionID, 'eee' as pageName, 'ccc' as convPageName
  
)
select *, ??? as prevConvPageName from a

Based on this example, looking at hitNumber = 5, I want to know what the convPageName is for the most recent record where the convPageName matches the pageName and the convPageName on the matching record is not null. In this case, the result would be bbb because the convPageName for hitNumber = 5 matches the pageName for hitNumber = 2.


Solution

  • This may be solved by using FIRST_VALUE in combination with joining table a with it self.

    Please note that I added an extra row to table a in order to prove that the latest record from table a based on hitNumber, is used.

    WITH
      a AS (
      SELECT
        1 AS hitNumber,
        'xx' AS sessionID,
        'aaa' AS pageName,
        NULL AS convPageName
      UNION ALL
      SELECT 2,'xx','ccc','fff'
      UNION ALL
      SELECT 3,'xx','ccc','bbb'
      UNION ALL
      SELECT 4,'xx','ccc', null
      UNION ALL
      SELECT 5,'xx','ddd', 'qqq'
      UNION ALL
      SELECT 6,'xx','eee', 'ccc'
    )
    
    select distinct
      outer_a.*,
      FIRST_VALUE(inner_a.convPageName)
        OVER (PARTITION BY inner_a.pageName ORDER BY inner_a.hitNumber desc
      ) as convPageNameMapped
    from
      a as outer_a
      left join a as inner_a on 
        outer_a.convPageName = inner_a.pageName
        and inner_a.convPageName is not null  
    

    Result

    enter image description here

    If only matched records should be included in the result, please replace left join with join.

    Relevant documentation

    Navigation functions