Search code examples
sqlmysqlsubqueryinner-join

SQL Join based on two different date columns


Given that I have these two following tables:

level_table

id level_date level
300 2021-05 C
300 2020-04 C
300 2019-08 C
300 2022-06 C
300 2019-02 C
300 2018-01 C
300 2023-05 C
300 2021-11 C
300 2021-09 C
300 2021-04 C

protocol_table

id protc_date protc
300 2021-03 1
300 2018-04 2
300 2021-08 3
300 2020-06 4
300 2019-01 5
300 2022-03 6
300 2019-05 7
300 2023-05 8
300 2019-06 9
300 2018-01 10

I'd like to create a third that would be the inner join of those two, on the condition that the level_date has to be <= protc_date, and the level_date has to be the max_value of that given ID, respecting the first condition.

So in this example, my desired output would be the following:

tbl_join

id level_date level protc_date protc
300 2020-04 C 2021-03 1
300 2018-01 C 2018-04 2
300 2021-05 C 2021-08 3
300 2020-04 C 2020-06 4
300 2018-01 C 2019-01 5
300 2021-11 C 2022-03 6
300 2019-02 C 2019-02 7
300 2023-05 C 2023-05 8
300 2019-02 C 2019-06 9
300 2018-01 C 2018-01 10

How can I create a query to perform this operation?

What i've tried so far is this query below, however it does not return me the desired output.

SELECT t1.*,
       t2.protc_date,
       t2.protc,
FROM level_table as t1
INNER JOIN protocol_table as t2
ON t1.id = t2.id
WHERE (t1.level_date <= t2.protc_date
    AND t1.level_date = (
        SELECT MAX(level_date)
        FROM level_table as t1
        WHERE t1.id = t2.id)
    )

This query has returned me only one row, where it has the max(level_date) of my whole table, and not the max(level_date) for each protc and id.

id level_date level protc_date protc
300 2023-05 C 2023-05 8

Solution

  • If you are running MySQL 8.0.13 or higher, I would recommend rewriting the query to use a lateral join; this simplifies the query and makes it more efficient, because it avoids opening the level table multiple times, as in your current code (once in the join, and another in a correlated subquery).

    select l.*, p.protc_date, p.protc
    from protocol_table p
    cross join lateral (
        select l.*
        from level_table l
        where l.id = p.id and l.level_date <= p.protc_date
        order by l.level_date desc 
        limit 1
    ) l
    

    The subquery in the lateral join correlates the level table to the each row of the protocol table, sorts matching records by descending date and retains the top record only - which is then available in the outer query.