I have a table that uses "start dates" or effective dates. The values in the table take effect from the start date onward, until it is overridden by another entry from the same table with a later start date.
My schema and sample data:
CREATE TABLE VALUE_DATA (
`start_date` DATE,
`value` FLOAT
);
INSERT INTO VALUE_DATA (start_date, value) VALUES
('2015-01-01', 10),
('2015-01-03', 20),
('2015-01-08', 30),
('2015-01-09', 15);
Query that produces required results:
SELECT date, value
FROM(
SELECT date, MAX(start_date) as max_start
FROM (
select curdate() - INTERVAL (ones.digit + (10 * tens.digit) + (100 * hundreds.digit)) DAY as date
from (select 0 as digit union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as ones
cross join (select 0 as digit union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as tens
cross join (select 0 as digit union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as hundreds
) DATE_TABLE
LEFT JOIN VALUE_DATA ON (DATE_TABLE.date >= VALUE_DATA.start_date)
WHERE DATE_TABLE.date between '2015-01-01' and '2015-01-10'
GROUP BY date
) START_DATES LEFT JOIN VALUE_DATA ON (START_DATES.max_start = VALUE_DATA.start_date);
I've created an SQL Fiddle to simulate the problem.
Although the SQL Fiddle works (gives correct results), I'm not convinced that it's the best way to do this. The query I had to use is a bit convoluted. I'd ultimately like to create a view for this table that contains the correct value for every day regardless of whether it falls on a start date or not (like the output produced by the Fiddle) to make it easier to join to this table. Obviously, I want to make sure this view is as fast as possible. So my question is, how can this query be improved (optimized) for use in such a view?
You need to be very careful with this type of view. It will be easy to write a view that is good at giving all the individual dates that each record is valid for, but slow when asking which record is valid on one specific date.
(Because to answer the second question involves answering the first question for Each and Every date first, then discarding the failures.)
The following is reasonable at taking a date and returning the rows valid on that date.
CREATE VIEW DAILY_VALUE_DATA AS (
SELECT
DATE_TABLE.date,
VALUE_TABLE.value
FROM
DATE_TABLE
LEFT JOIN
VALUE_DATA
ON VALUE_DATA.start_date = (SELECT MAX(lookup.start_date)
FROM VALUE_DATA lookup
WHERE lookup.start_date <= DATE_TABLE.date
)
);
SELECT * FROM DAILY_VALUE_DATA WHERE date = '2015-08-11'
Note: This assumes DateTable is a real persistent materialised table, not the in-line view you used, use of which will greatly compromise performance.
It also assumes that VALUE_DATA
is indexed by the start_date
.
EDIT:
I also find it likely that your value table will likely have other columns. Let's say that it is a value per person. Maybe their address on any given date.
To extend the query above you then also need to join on the person
table...
CREATE VIEW DAILY_VALUE_DATA AS (
SELECT
PERSON.id AS person_id,
DATE_TABLE.date,
VALUE_TABLE.value
FROM
PERSON
INNER JOIN
DATE_TABLE
ON DATE_TABLE.date >= PERSON.date_of_birth
AND DATE_TABLE.date < COALESCE(PERSON.date_of_death, CURDATE() + 1)
LEFT JOIN
VALUE_DATA
ON VALUE_DATA.start_date = (SELECT MAX(lookup.start_date)
FROM VALUE_DATA lookup
WHERE lookup.start_date <= DATE_TABLE.date
AND lookup.person_id = PERSON.id
)
);
SELECT * FROM DAILY_VALUE_DATA WHERE person_id = 1 AND date = '2015-08-11'
EDIT:
Another alternative to the LEFT JOIN
is to embed the correllated sub-query in the SELECT
block. This is effective when you only have one value to pull from the target table, but less effective if you need to pull many values from the target table...
CREATE VIEW DAILY_VALUE_DATA AS (
SELECT
PERSON.id AS person_id,
DATE_TABLE.date,
(SELECT VALUE_DATA.value
FROM VALUE_DATA
WHERE VALUE_DATA.start_date <= DATE_TABLE.date
AND VALUE_DATA.person_id = PERSON.id
ORDER BY VALUE_DATA.start_date DESC
LIMIT 1
) AS value
FROM
PERSON
INNER JOIN
DATE_TABLE
ON DATE_TABLE.date >= PERSON.date_of_birth
AND DATE_TABLE.date < COALESCE(PERSON.date_of_death, CURDATE() + 1)
);
SELECT * FROM DAILY_VALUE_DATA WHERE person_id = 1 AND date = '2015-08-11'