Search code examples
mysqlsqlquery-optimization

Building a daily view from a table with an "effective date"


I have a table that uses "start dates" or effective dates. The values in the table take effect from the start date onward, until it is overridden by another entry from the same table with a later start date.

My schema and sample data:

CREATE TABLE VALUE_DATA (
    `start_date` DATE,
    `value` FLOAT
);

INSERT INTO VALUE_DATA (start_date, value) VALUES
('2015-01-01', 10),
('2015-01-03', 20),
('2015-01-08', 30),
('2015-01-09', 15);

Query that produces required results:

SELECT date, value
FROM(
    SELECT date, MAX(start_date) as max_start
    FROM (
        select curdate() - INTERVAL (ones.digit + (10 * tens.digit) + (100 * hundreds.digit)) DAY as date
        from (select 0 as digit union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as ones
        cross join (select 0 as digit union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as tens
        cross join (select 0 as digit union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as hundreds
    ) DATE_TABLE
    LEFT JOIN VALUE_DATA ON (DATE_TABLE.date >= VALUE_DATA.start_date)
    WHERE DATE_TABLE.date between '2015-01-01' and '2015-01-10'
    GROUP BY date
) START_DATES LEFT JOIN VALUE_DATA ON (START_DATES.max_start = VALUE_DATA.start_date);

I've created an SQL Fiddle to simulate the problem.

Although the SQL Fiddle works (gives correct results), I'm not convinced that it's the best way to do this. The query I had to use is a bit convoluted. I'd ultimately like to create a view for this table that contains the correct value for every day regardless of whether it falls on a start date or not (like the output produced by the Fiddle) to make it easier to join to this table. Obviously, I want to make sure this view is as fast as possible. So my question is, how can this query be improved (optimized) for use in such a view?


Solution

  • You need to be very careful with this type of view. It will be easy to write a view that is good at giving all the individual dates that each record is valid for, but slow when asking which record is valid on one specific date.

    (Because to answer the second question involves answering the first question for Each and Every date first, then discarding the failures.)

    The following is reasonable at taking a date and returning the rows valid on that date.

    CREATE VIEW DAILY_VALUE_DATA AS (
        SELECT
            DATE_TABLE.date,
            VALUE_TABLE.value
        FROM
            DATE_TABLE
        LEFT JOIN
            VALUE_DATA
                ON  VALUE_DATA.start_date = (SELECT MAX(lookup.start_date)
                                               FROM VALUE_DATA lookup
                                              WHERE lookup.start_date <= DATE_TABLE.date
                                            )
    );
    
    SELECT * FROM DAILY_VALUE_DATA WHERE date = '2015-08-11'
    

    Note: This assumes DateTable is a real persistent materialised table, not the in-line view you used, use of which will greatly compromise performance.

    It also assumes that VALUE_DATA is indexed by the start_date.


    EDIT:

    I also find it likely that your value table will likely have other columns. Let's say that it is a value per person. Maybe their address on any given date.

    To extend the query above you then also need to join on the person table...

    CREATE VIEW DAILY_VALUE_DATA AS (
        SELECT
            PERSON.id   AS person_id,
            DATE_TABLE.date,
            VALUE_TABLE.value
        FROM
            PERSON
        INNER JOIN
            DATE_TABLE
                ON  DATE_TABLE.date >=          PERSON.date_of_birth
                AND DATE_TABLE.date <  COALESCE(PERSON.date_of_death, CURDATE() + 1)
        LEFT JOIN
            VALUE_DATA
                ON  VALUE_DATA.start_date = (SELECT MAX(lookup.start_date)
                                               FROM VALUE_DATA lookup
                                              WHERE lookup.start_date <= DATE_TABLE.date
                                                AND lookup.person_id   = PERSON.id
                                            )
    );
    
    SELECT * FROM DAILY_VALUE_DATA WHERE person_id = 1 AND date = '2015-08-11'
    


    EDIT:

    Another alternative to the LEFT JOIN is to embed the correllated sub-query in the SELECT block. This is effective when you only have one value to pull from the target table, but less effective if you need to pull many values from the target table...

    CREATE VIEW DAILY_VALUE_DATA AS (
        SELECT
            PERSON.id   AS person_id,
            DATE_TABLE.date,
            (SELECT VALUE_DATA.value
               FROM VALUE_DATA
              WHERE VALUE_DATA.start_date <= DATE_TABLE.date
                AND VALUE_DATA.person_id   = PERSON.id
           ORDER BY VALUE_DATA.start_date DESC
              LIMIT 1
            )   AS value
        FROM
            PERSON
        INNER JOIN
            DATE_TABLE
                ON  DATE_TABLE.date >=          PERSON.date_of_birth
                AND DATE_TABLE.date <  COALESCE(PERSON.date_of_death, CURDATE() + 1)
    );
    
    SELECT * FROM DAILY_VALUE_DATA WHERE person_id = 1 AND date = '2015-08-11'