Given that I have these two following tables:
level_table
id | level_date | level |
---|---|---|
300 | 2021-05 | C |
300 | 2020-04 | C |
300 | 2019-08 | C |
300 | 2022-06 | C |
300 | 2019-02 | C |
300 | 2018-01 | C |
300 | 2023-05 | C |
300 | 2021-11 | C |
300 | 2021-09 | C |
300 | 2021-04 | C |
protocol_table
id | protc_date | protc |
---|---|---|
300 | 2021-03 | 1 |
300 | 2018-04 | 2 |
300 | 2021-08 | 3 |
300 | 2020-06 | 4 |
300 | 2019-01 | 5 |
300 | 2022-03 | 6 |
300 | 2019-05 | 7 |
300 | 2023-05 | 8 |
300 | 2019-06 | 9 |
300 | 2018-01 | 10 |
I'd like to create a third that would be the inner join of those two, on the condition that the level_date has to be <= protc_date, and the level_date has to be the max_value of that given ID, respecting the first condition.
So in this example, my desired output would be the following:
tbl_join
id | level_date | level | protc_date | protc |
---|---|---|---|---|
300 | 2020-04 | C | 2021-03 | 1 |
300 | 2018-01 | C | 2018-04 | 2 |
300 | 2021-05 | C | 2021-08 | 3 |
300 | 2020-04 | C | 2020-06 | 4 |
300 | 2018-01 | C | 2019-01 | 5 |
300 | 2021-11 | C | 2022-03 | 6 |
300 | 2019-02 | C | 2019-02 | 7 |
300 | 2023-05 | C | 2023-05 | 8 |
300 | 2019-02 | C | 2019-06 | 9 |
300 | 2018-01 | C | 2018-01 | 10 |
How can I create a query to perform this operation?
What i've tried so far is this query below, however it does not return me the desired output.
SELECT t1.*,
t2.protc_date,
t2.protc,
FROM level_table as t1
INNER JOIN protocol_table as t2
ON t1.id = t2.id
WHERE (t1.level_date <= t2.protc_date
AND t1.level_date = (
SELECT MAX(level_date)
FROM level_table as t1
WHERE t1.id = t2.id)
)
This query has returned me only one row, where it has the max(level_date) of my whole table, and not the max(level_date) for each protc and id.
id | level_date | level | protc_date | protc |
---|---|---|---|---|
300 | 2023-05 | C | 2023-05 | 8 |
If you are running MySQL 8.0.13 or higher, I would recommend rewriting the query to use a lateral join; this simplifies the query and makes it more efficient, because it avoids opening the level table multiple times, as in your current code (once in the join, and another in a correlated subquery).
select l.*, p.protc_date, p.protc
from protocol_table p
cross join lateral (
select l.*
from level_table l
where l.id = p.id and l.level_date <= p.protc_date
order by l.level_date desc
limit 1
) l
The subquery in the lateral join correlates the level table to the each row of the protocol table, sorts matching records by descending date and retains the top record only - which is then available in the outer query.