I am using MariaDB sql. I am creating a statistical functionality in my website that needs to fetch the historical data for the user aggregated on each day basis. For this I have created a history table, that contains rows for only when a change occurred. And I also have my "main" table, which contains the current data as is today.
So I need my query to retrieve the status id for each user_id and organisation_id combination from the last row if this value was not null.
To illustrate, example this data, this is considered the "current" data (table name = organisation_user_link):
id | user_id | organisation_id | status_id | stopped_reason_id | dossier_created |
---|---|---|---|---|---|
1 | 3 | 73 | 2 | NULL | 2021-10-29 07:50:21 |
2 | 9 | 1199 | 4 | 5 | 2021-05-19 17:44:07 |
And next to this I have my history data, which looks very similar (table name = organisation_user_link_status_history):
timestamp | user_id | organisation_id | status_id | stopped_reason_id |
---|---|---|---|---|
2024-03-11 12:05:30 | 3 | 73 | 1 | NULL |
2024-03-08 11:15:35 | 3 | 73 | 3 | NULL |
2024-03-05 13:25:40 | 3 | 73 | 4 | 3 |
2024-03-13 02:07:10 | 9 | 1199 | 1 | NULL |
2024-03-11 02:07:10 | 9 | 1199 | 2 | NULL |
I want my result to include every day starting from today until a certain date. Where each day has the value from the previous row, in case there was no value for that day. The values are ordered DESC, so the "current" data is always first, since this is the data as on date TODAY.
This is the result I want to become:
date | user_id | organisation_id | status_id | stopped_reason_id | dossier_created |
---|---|---|---|---|---|
2024-03-14 | 3 | 73 | 2 | NULL | 2021-10-29 |
2024-03-14 | 9 | 1199 | 4 | 5 | 2021-05-19 |
2024-03-13 | 3 | 73 | 2 | NULL | 2021-10-29 |
2024-03-13 | 9 | 1199 | 1 | NULL | 2021-05-19 |
2024-03-12 | 3 | 73 | 2 | NULL | 2021-10-29 |
2024-03-12 | 9 | 1199 | 1 | NULL | 2021-05-19 |
2024-03-11 | 3 | 73 | 1 | NULL | 2021-10-29 |
2024-03-11 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-10 | 3 | 73 | 1 | NULL | 2021-10-29 |
2024-03-10 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-09 | 3 | 73 | 1 | NULL | 2021-10-29 |
2024-03-09 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-08 | 3 | 73 | 3 | NULL | 2021-10-29 |
2024-03-08 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-07 | 3 | 73 | 3 | NULL | 2021-10-29 |
2024-03-07 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-06 | 3 | 73 | 3 | NULL | 2021-10-29 |
2024-03-06 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-05 | 3 | 73 | 4 | 3 | 2021-10-29 |
2024-03-05 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-04 | 3 | 73 | 4 | 3 | 2021-10-29 |
2024-03-04 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-03 | 3 | 73 | 4 | 3 | 2021-10-29 |
2024-03-03 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-02 | 3 | 73 | 4 | 3 | 2021-10-29 |
2024-03-02 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-01 | 3 | 73 | 4 | 3 | 2021-10-29 |
2024-03-01 | 9 | 1199 | 2 | NULL | 2021-05-19 |
This is the query I have right now:
WITH RECURSIVE dates (
DATE
) AS (
-- SELECT MIN(DATE(created))
-- FROM organisation
SELECT DATE('2024-03-01')
UNION ALL
SELECT DATE(date) + INTERVAL 1 DAY
FROM dates
WHERE DATE(DATE) < (NOW() - INTERVAL 1 DAY)
),
current_history_data_query AS (
SELECT
current_history_data.*
FROM (
SELECT
DATE(timestamp) AS date,
user_id,
organisation_id,
status_id,
stopped_reason_id,
dossier_created,
'history-data' AS src
FROM (
SELECT
oulsh.user_id,
oulsh.organisation_id,
oulsh.timestamp,
oulsh.status_id,
oulsh.stopped_reason_id,
oul.dossier_created,
ROW_NUMBER() OVER (PARTITION BY oulsh.user_id, oulsh.organisation_id, DATE(oulsh.timestamp) ORDER BY oulsh.timestamp DESC) AS row_num
FROM organisation_user_link_status_history AS oulsh
INNER JOIN organisation_user_link AS oul ON oulsh.user_id = oul.user_id AND oulsh.organisation_id = oul.organisation_id
) AS numbered_rows
WHERE row_num = 1 AND DATE(timestamp) != DATE(NOW())
UNION ALL
SELECT DATE(NOW()) AS date, oul.user_id, oul.organisation_id, oul.status_id, oul.stopped_reason_id, oul.dossier_created, 'current-data' AS src
FROM organisation_user_link AS oul
) AS current_history_data
ORDER BY DATE DESC
)
SELECT
dates.date AS dates_date,
COALESCE(user_id, LAG(user_id) OVER (ORDER BY dates_date DESC)) AS user_id,
COALESCE(organisation_id, LAG(organisation_id) OVER (ORDER BY dates_date DESC)) AS organisation_id,
COALESCE(status_id, LAG(status_id) OVER (ORDER BY dates_date DESC)) AS status_id,
COALESCE(stopped_reason_id, LAG(stopped_reason_id) OVER (ORDER BY dates_date DESC)) AS stopped_reason_id,
COALESCE(dossier_created, LAG(dossier_created) OVER (ORDER BY dates_date DESC)) AS dossier_created
FROM dates
LEFT JOIN current_history_data_query AS chdq ON dates.date = chdq.date
GROUP BY DATE(dates.date)
ORDER BY dates.date DESC;
Using this query, there are multiple problems:
What am I missing here, how can I solve this query ?
As MariaDB does not support LATERAL or CROSS APPLY, you could use three separate correlated subqueries in the select list:
SELECT d.date, u.user_id, u.organisation_id,
(
SELECT status_id
FROM current_history_data_query
WHERE user_id = u.user_id
AND organisation_id = u.organisation_id
AND date >= d.date
ORDER BY date ASC
LIMIT 1
) AS status_id,
(
SELECT stopped_reason_id
FROM current_history_data_query
WHERE user_id = u.user_id
AND organisation_id = u.organisation_id
AND date >= d.date
ORDER BY date ASC
LIMIT 1
) AS stopped_reason_id,
(
SELECT dossier_created
FROM current_history_data_query
WHERE user_id = u.user_id
AND organisation_id = u.organisation_id
AND date >= d.date
ORDER BY date ASC
LIMIT 1
) AS dossier_created
FROM dates d
JOIN (SELECT DISTINCT user_id, organisation_id FROM organisation_user_link) u
ORDER BY d.date DESC, u.user_id;
Here's a db<>fiddle.
There is probably a better solution but one way you could achieve your desired output is to use a join to a lateral derived table. The modified final query would be:
SELECT d.date, u.user_id, u.organisation_id, status_id, stopped_reason_id, DATE(dossier_created) AS dossier_created
FROM dates d
JOIN (SELECT DISTINCT user_id, organisation_id FROM organisation_user_link) u
JOIN LATERAL (
SELECT *
FROM current_history_data_query
WHERE user_id = u.user_id
AND organisation_id = u.organisation_id
AND date >= d.date
ORDER BY date ASC
LIMIT 1
) h
ORDER BY d.date DESC, u.user_id;
Here's a db<>fiddle.