Search code examples
sqljoinpartitionrow-number

Join on most recent date and by partition


I have two tables, clicks:

user    date
a   01/10/2021
a   02/10/2021
a   03/10/2021
b   27/09/2021
c   08/10/2021
c   13/10/2021

and segment:

user    date    segment
a   26/09/2021  1
a   27/09/2021  1
a   03/10/2021  2
c   01/10/2021  5
c   10/10/2021  6

I want to join segment to clicks to identify the latest segment per user on that given click date. So the correct output will look like this:

user    date    segment
a   01/10/2021  1
a   02/10/2021  1
a   03/10/2021  2
b   27/09/2021  
c   08/10/2021  5
c   13/10/2021  6

I have tried the following but get the error 'this type of correlated subquery pattern is not supported due to internal error'

    select *,
   (select top 1 segment
    from b
    where
          b.date <= a.date
    ORDER BY ROW_NUMBER() OVER(PARTITION BY b.id ORDER BY b.date DESC)) as segment_lookup
from a;

Any help greatly appreciated to find the best method. Thanks in advance


Solution

  • You don't mention the database so I'll assume it's SQL Server.

    Also, the first table doesn't seem to have a key so the query below uses (user, date) as a key.

    You can do:

    select *
    from (
      select
        c.*, s.segment,
        row_number() over(partition by c.u, c.d order by s.d desc) as rn
      from clicks c
      left join segment s on s.u = c.u and c.d >= s.d
    ) x
    where rn = 1
    

    Result:

     u  d           segment  rn 
     -- ----------- -------- -- 
     a  2021-10-01  1        1  
     a  2021-10-02  1        1  
     a  2021-10-03  2        1  
     b  2021-09-27           1  
     c  2021-10-08  5        1  
     c  2021-10-13  6        1  
    

    See running example at db<>fiddle 1.

    Alternatively, you can use a lateral join that could theoretically have better performance.

    EDIT: Adding lateral join as requested

    The query with a lateral join can take the form:

    select
      c.*, s.segment
    from clicks c
    outer apply (
      select top 1 * from segment s where s.u = c.u and c.d >= s.d
      order by s.d desc
    ) s
    

    Result:

     u  d           segment 
     -- ----------- ------- 
     a  2021-10-01  1       
     a  2021-10-02  1       
     a  2021-10-03  2       
     b  2021-09-27          
     c  2021-10-08  5       
     c  2021-10-13  6       
    

    See running example at db<>fiddle 2.

    The lateral join could be quite performant if the index segment (u, d) is created.