Search code examples
sqlmariadbwindow-functions

Fetch values from previous non-null value rows


I am using MariaDB sql. I am creating a statistical functionality in my website that needs to fetch the historical data for the user aggregated on each day basis. For this I have created a history table, that contains rows for only when a change occurred. And I also have my "main" table, which contains the current data as is today.

So I need my query to retrieve the status id for each user_id and organisation_id combination from the last row if this value was not null.

To illustrate, example this data, this is considered the "current" data (table name = organisation_user_link):

id user_id organisation_id status_id stopped_reason_id dossier_created
1 3 73 2 NULL 2021-10-29 07:50:21
2 9 1199 4 5 2021-05-19 17:44:07

And next to this I have my history data, which looks very similar (table name = organisation_user_link_status_history):

timestamp user_id organisation_id status_id stopped_reason_id
2024-03-11 12:05:30 3 73 1 NULL
2024-03-08 11:15:35 3 73 3 NULL
2024-03-05 13:25:40 3 73 4 3
2024-03-13 02:07:10 9 1199 1 NULL
2024-03-11 02:07:10 9 1199 2 NULL

I want my result to include every day starting from today until a certain date. Where each day has the value from the previous row, in case there was no value for that day. The values are ordered DESC, so the "current" data is always first, since this is the data as on date TODAY.

This is the result I want to become:

date user_id organisation_id status_id stopped_reason_id dossier_created
2024-03-14 3 73 2 NULL 2021-10-29
2024-03-14 9 1199 4 5 2021-05-19
2024-03-13 3 73 2 NULL 2021-10-29
2024-03-13 9 1199 1 NULL 2021-05-19
2024-03-12 3 73 2 NULL 2021-10-29
2024-03-12 9 1199 1 NULL 2021-05-19
2024-03-11 3 73 1 NULL 2021-10-29
2024-03-11 9 1199 2 NULL 2021-05-19
2024-03-10 3 73 1 NULL 2021-10-29
2024-03-10 9 1199 2 NULL 2021-05-19
2024-03-09 3 73 1 NULL 2021-10-29
2024-03-09 9 1199 2 NULL 2021-05-19
2024-03-08 3 73 3 NULL 2021-10-29
2024-03-08 9 1199 2 NULL 2021-05-19
2024-03-07 3 73 3 NULL 2021-10-29
2024-03-07 9 1199 2 NULL 2021-05-19
2024-03-06 3 73 3 NULL 2021-10-29
2024-03-06 9 1199 2 NULL 2021-05-19
2024-03-05 3 73 4 3 2021-10-29
2024-03-05 9 1199 2 NULL 2021-05-19
2024-03-04 3 73 4 3 2021-10-29
2024-03-04 9 1199 2 NULL 2021-05-19
2024-03-03 3 73 4 3 2021-10-29
2024-03-03 9 1199 2 NULL 2021-05-19
2024-03-02 3 73 4 3 2021-10-29
2024-03-02 9 1199 2 NULL 2021-05-19
2024-03-01 3 73 4 3 2021-10-29
2024-03-01 9 1199 2 NULL 2021-05-19

This is the query I have right now:

WITH RECURSIVE dates (
    DATE
) AS (
    -- SELECT MIN(DATE(created))
    -- FROM organisation
    SELECT DATE('2024-03-01')
    UNION ALL
    SELECT DATE(date) + INTERVAL 1 DAY
    FROM dates
    WHERE DATE(DATE) < (NOW() - INTERVAL 1 DAY)
),
current_history_data_query AS (
    SELECT 
        current_history_data.*
    FROM (
        SELECT
           DATE(timestamp) AS date,
           user_id,
           organisation_id,
           status_id,
           stopped_reason_id,
           dossier_created,
           'history-data' AS src
         FROM (
           SELECT
               oulsh.user_id,
               oulsh.organisation_id,
               oulsh.timestamp,
               oulsh.status_id,
               oulsh.stopped_reason_id,
               oul.dossier_created,
               ROW_NUMBER() OVER (PARTITION BY oulsh.user_id, oulsh.organisation_id, DATE(oulsh.timestamp) ORDER BY oulsh.timestamp DESC) AS row_num
           FROM organisation_user_link_status_history AS oulsh
           INNER JOIN organisation_user_link AS oul ON oulsh.user_id = oul.user_id AND oulsh.organisation_id = oul.organisation_id
         ) AS numbered_rows
         WHERE row_num = 1 AND DATE(timestamp) != DATE(NOW())
        
         UNION ALL
        
         SELECT DATE(NOW()) AS date, oul.user_id, oul.organisation_id, oul.status_id, oul.stopped_reason_id, oul.dossier_created, 'current-data' AS src
         FROM organisation_user_link AS oul
    ) AS current_history_data
    ORDER BY DATE DESC
)
SELECT
    dates.date AS dates_date,
    COALESCE(user_id, LAG(user_id) OVER (ORDER BY dates_date DESC)) AS user_id,
    COALESCE(organisation_id, LAG(organisation_id) OVER (ORDER BY dates_date DESC)) AS organisation_id,
    COALESCE(status_id, LAG(status_id) OVER (ORDER BY dates_date DESC)) AS status_id,
    COALESCE(stopped_reason_id, LAG(stopped_reason_id) OVER (ORDER BY dates_date DESC)) AS stopped_reason_id,
    COALESCE(dossier_created, LAG(dossier_created) OVER (ORDER BY dates_date DESC)) AS dossier_created
FROM dates
LEFT JOIN current_history_data_query AS chdq ON dates.date = chdq.date
GROUP BY DATE(dates.date)
ORDER BY dates.date DESC;

Using this query, there are multiple problems:

  • For the first row after an original non-null value row: the data is correctly being copied. However, afterwards, the third row for example will still be NULL. Even though the second row was just filled by the GAP() window function.
  • The code as provided above does not take into account that I also need it to be partitioned by user_id and organisation_id. If i add in the LAG() function : PARTITION BY user_id, organisation_id . Then the entire LAG function is not working anymore, and not even my 2nd row is getting the data filled in.

What am I missing here, how can I solve this query ?


Solution

  • As MariaDB does not support LATERAL or CROSS APPLY, you could use three separate correlated subqueries in the select list:

    SELECT d.date, u.user_id, u.organisation_id,
      (
        SELECT status_id
        FROM current_history_data_query
        WHERE user_id = u.user_id
        AND organisation_id = u.organisation_id
        AND date >= d.date
        ORDER BY date ASC
        LIMIT 1
      ) AS status_id,
      (
        SELECT stopped_reason_id
        FROM current_history_data_query
        WHERE user_id = u.user_id
        AND organisation_id = u.organisation_id
        AND date >= d.date
        ORDER BY date ASC
        LIMIT 1
      ) AS stopped_reason_id,
      (
        SELECT dossier_created
        FROM current_history_data_query
        WHERE user_id = u.user_id
        AND organisation_id = u.organisation_id
        AND date >= d.date
        ORDER BY date ASC
        LIMIT 1
      ) AS dossier_created
    FROM dates d
    JOIN (SELECT DISTINCT user_id, organisation_id FROM organisation_user_link) u
    ORDER BY d.date DESC, u.user_id;
    

    Here's a db<>fiddle.


    There is probably a better solution but one way you could achieve your desired output is to use a join to a lateral derived table. The modified final query would be:

    SELECT d.date, u.user_id, u.organisation_id, status_id, stopped_reason_id, DATE(dossier_created) AS dossier_created
    FROM dates d
    JOIN (SELECT DISTINCT user_id, organisation_id FROM organisation_user_link) u
    JOIN LATERAL (
        SELECT *
        FROM current_history_data_query
        WHERE user_id = u.user_id
        AND organisation_id = u.organisation_id
        AND date >= d.date
        ORDER BY date ASC
        LIMIT 1
    ) h
    ORDER BY d.date DESC, u.user_id;
    

    Here's a db<>fiddle.