Search code examples
mysqlsqldatewindow-functionsrecursive-query

How to carrying over values for missing dates in time series using last value windows analytical functions in mysql


How to carrying over values for missing dates postcode/indicator_category to create full monthly time series. Im trying to use last_value to carry over values but not able to make it. Is my approach correct? Any help would by highly appreciated.

Example given a table:

    create table test.indicator_data(
      postcode text, 
      month_ts date, 
      indicator_cat integer,
      measure double precision);

INSERT INTO value to indicator_data table

    INSERT INTO test.indicator_data 
    VALUES
    ('sw5', '2017-07-01', 2, 99212.231),
    ('sw5', '2018-02-01', 2, 232.215),
    ('sw5', '2017-11-01', 3, 1523.2576),
    ('sw5', '2017-12-01', 3, 152.16),
    ('sw5', '2018-02-01', 3, 142.981),
    ('sw5', '2018-07-01', 3 , 142.1361),
    ('sw5 9', '2018-03-01', 2, 821.21),
    ('sw5 9', '2018-02-01', 2, 1182.19);

INPUT:

postcode month_ts indicator_cat measure
sw5 2017-07-01 2 99212.231
sw5 2018-02-01 2 232.215
sw5 2017-11-01 3 1523.2576
sw5 2017-12-01 3 152.16
sw5 2018-02-01 3 142.981
sw5 2018-07-01 3 142.1361
sw59 2018-03-01 2 821.21
sw59 2018-02-01 2 1182.19

EXPECTED OUTPUT:

postcode month_ts indicator_cat measure
sw5 2017-07-01 2 99212.231
sw5 2017-08-01 2 99212.231
sw5 2017-09-01 2 99212.231
sw5 2017-10-01 2 99212.231
sw5 2017-11-01 2 99212.231
sw5 2017-12-01 2 99212.231
sw5 2018-01-01 2 99212.231
sw5 2018-02-01 2 232.215
sw5 2017-11-01 3 1523.2576
sw5 2017-12-01 3 152.16
sw5 2018-01-01 3 152.16
sw5 2018-02-01 3 142.981
sw5 2018-03-01 3 142.981
sw5 2018-04-01 3 142.981
sw5 2018-05-01 3 142.981
sw5 2018-06-01 3 142.981
sw5 2018-07-01 3 142.1361
sw59 2018-02-01 2 1182.19
sw59 2018-03-01 2 821.21

Tried SOLUTION:

Created Calender table using below procedue:

    DELIMITER |
    CREATE PROCEDURE test.fill_calendar(`start_date` DATE, `end_date` DATE)
    BEGIN
      DECLARE `crt_date` DATE;
      SET `crt_date`=start_date;
      WHILE `crt_date` < `end_date` DO
        INSERT INTO calendar VALUES(`crt_date`);
        SET `crt_date` = ADDDATE(`crt_date`, INTERVAL 1 MONTH);
      END WHILE;
    END |
    DELIMITER ;

Calling Procedure

CALL test.fill_calendar('2017-07-01', '2018-07-01');

Query tried to missing dates but its failing to get last_values

    select postcode,last_value(postcode) over (order by datefield) from (
    SELECT indicator_data.postcode,calendar.datefield AS date,indicator_data.indicator_cat,indicator_data.measure
    FROM indicator_data RIGHT join calendar ON (DATE(indicator_data.month_ts) = calendar.datefield)
    WHERE (calendar.datefield BETWEEN (SELECT MIN(DATE(month_ts)) FROM indicator_data) AND (SELECT MAX(DATE(month_ts)) FROM indicator_data))
    GROUP BY date) as test;

Solution

  • You can create the calendar table's resultset with a recursive CTE.
    Then do a LEFT join of the calendar to the table and use SUM() window function to create groups of rows with null in measure.
    Finally use MAX() window function to pick the last non-null measure:

    WITH  
      RECURSIVE dates AS (
        SELECT postcode, indicator_cat, MIN(month_ts) month_ts, MAX(month_ts) max_month_ts 
        FROM indicator_data
        GROUP BY postcode, indicator_cat
        UNION ALL
        SELECT postcode, indicator_cat, month_ts + INTERVAL 1 MONTH, max_month_ts 
        FROM dates 
        WHERE month_ts + INTERVAL 1 MONTH <= max_month_ts
      ),
      cte AS (
        SELECT d.postcode, d.month_ts, d.indicator_cat, i.measure,
               SUM(i.measure IS NOT NULL) OVER (PARTITION BY d.postcode, d.indicator_cat ORDER BY d.month_ts) grp
        FROM dates d LEFT JOIN indicator_data i
        ON (i.postcode, i.indicator_cat, i.month_ts) = (d.postcode, d.indicator_cat, d.month_ts)
      )
    SELECT postcode, month_ts, indicator_cat,  
           MAX(measure) OVER (PARTITION BY postcode, indicator_cat, grp) measure   
    FROM cte
    ORDER BY postcode, indicator_cat, month_ts;
    

    See the demo.