I have two tables, clicks:
user date
a 01/10/2021
a 02/10/2021
a 03/10/2021
b 27/09/2021
c 08/10/2021
c 13/10/2021
and segment:
user date segment
a 26/09/2021 1
a 27/09/2021 1
a 03/10/2021 2
c 01/10/2021 5
c 10/10/2021 6
I want to join segment to clicks to identify the latest segment per user on that given click date. So the correct output will look like this:
user date segment
a 01/10/2021 1
a 02/10/2021 1
a 03/10/2021 2
b 27/09/2021
c 08/10/2021 5
c 13/10/2021 6
I have tried the following but get the error 'this type of correlated subquery pattern is not supported due to internal error'
select *,
(select top 1 segment
from b
where
b.date <= a.date
ORDER BY ROW_NUMBER() OVER(PARTITION BY b.id ORDER BY b.date DESC)) as segment_lookup
from a;
Any help greatly appreciated to find the best method. Thanks in advance
You don't mention the database so I'll assume it's SQL Server.
Also, the first table doesn't seem to have a key so the query below uses (user
, date
) as a key.
You can do:
select *
from (
select
c.*, s.segment,
row_number() over(partition by c.u, c.d order by s.d desc) as rn
from clicks c
left join segment s on s.u = c.u and c.d >= s.d
) x
where rn = 1
Result:
u d segment rn
-- ----------- -------- --
a 2021-10-01 1 1
a 2021-10-02 1 1
a 2021-10-03 2 1
b 2021-09-27 1
c 2021-10-08 5 1
c 2021-10-13 6 1
See running example at db<>fiddle 1.
Alternatively, you can use a lateral join that could theoretically have better performance.
EDIT: Adding lateral join as requested
The query with a lateral join can take the form:
select
c.*, s.segment
from clicks c
outer apply (
select top 1 * from segment s where s.u = c.u and c.d >= s.d
order by s.d desc
) s
Result:
u d segment
-- ----------- -------
a 2021-10-01 1
a 2021-10-02 1
a 2021-10-03 2
b 2021-09-27
c 2021-10-08 5
c 2021-10-13 6
See running example at db<>fiddle 2.
The lateral join could be quite performant if the index segment (u, d)
is created.