I'm looking for a window function to retrieve a value in a previous record in BigQuery
by comparing a value in the current record until it finds the most recent match. Here is a sample table:
with a as (
select 1 as hitNumber, 'xx' as sessionID, 'aaa' as pageName, null as convPageName
union all
select 2 as hitNumber, 'xx' as sessionID, 'ccc' as pageName, 'bbb' as convPageName
union all
select 3 as hitNumber, 'xx' as sessionID, 'ccc' as pageName, null as convPageName
union all
select 4 as hitNumber, 'xx' as sessionID, 'ddd' as pageName, 'qqq' as convPageName
union all
select 5 as hitNumber, 'xx' as sessionID, 'eee' as pageName, 'ccc' as convPageName
)
select *, ??? as prevConvPageName from a
Based on this example, looking at hitNumber = 5
, I want to know what the convPageName
is for the most recent record where the convPageName
matches the pageName
and the convPageName
on the matching record is not null. In this case, the result would be bbb
because the convPageName
for hitNumber = 5
matches the pageName
for hitNumber = 2
.
This may be solved by using FIRST_VALUE
in combination with joining table a
with it self.
Please note that I added an extra row to table a
in order to prove that the latest record from table a
based on hitNumber, is used.
WITH
a AS (
SELECT
1 AS hitNumber,
'xx' AS sessionID,
'aaa' AS pageName,
NULL AS convPageName
UNION ALL
SELECT 2,'xx','ccc','fff'
UNION ALL
SELECT 3,'xx','ccc','bbb'
UNION ALL
SELECT 4,'xx','ccc', null
UNION ALL
SELECT 5,'xx','ddd', 'qqq'
UNION ALL
SELECT 6,'xx','eee', 'ccc'
)
select distinct
outer_a.*,
FIRST_VALUE(inner_a.convPageName)
OVER (PARTITION BY inner_a.pageName ORDER BY inner_a.hitNumber desc
) as convPageNameMapped
from
a as outer_a
left join a as inner_a on
outer_a.convPageName = inner_a.pageName
and inner_a.convPageName is not null
Result
If only matched records should be included in the result, please replace left join
with join
.
Relevant documentation