I have a table that records a row for each time a score for a location has changed.
score_history:
This was done with an eye on efficiency and being able to simply retrieve a list of changes for a given location and serves that purpose nicely.
I'm trying to output the data in a very redundant format to help load it into a rigid external system. The external system expects a row for each location * every date. The goal is to represent the last score value for each location for each date. So if the score changed 3 times in a given date only the score closest to midnight would be considered that locations closing score for the day. I imagine this is similar to the challenge of creating a close of business inventory level fact table.
I have a handy star schema style date dimension table which has a row for every date fully covering this sample period and well into the future.
That table looks like
dw_dim_date:
So, if I had only 3 records in the score_history table...
1, 2019-01-01:10:13:01, 100, 5.0
2, 2019-01-05:20:00:01, 100, 5.8
3, 2019-01-05:23:01:22, 100, 6.2
The desired output would be:
2019-01-01, 100, 5.0
2019-01-02, 100, 5.0
2019-01-03, 100, 5.0
2019-01-04, 100, 5.0
2019-01-05, 100, 6.2
3 Requirements:
I've been chasing my tail through subqueries and window functions.
Because I'm hesitant to post something without something I tried I'll share this trainwreck which produces output but of no meaning...
SELECT dw_dim_date.date,
(SELECT score
FROM score_history
WHERE score_history.happened_at::DATE < dw_dim_date.date
OR score_history.happened_at::DATE = dw_dim_date.date
ORDER BY score_history.id desc limit 1) as last_score
FROM dw_dim_date
WHERE dw_dim_date.date > '2019-06-01'
Grateful for guidance or pointers to other questions to read.
You could achieve it with usage of correlated subqueries and LATERAL
:
SELECT sub.date, sub.location_id, score
FROM (SELECT * FROM dw_dim_date
CROSS JOIN (SELECT DISTINCT location_id FROM score_history) s
WHERE date >= '2019-01-01'::date) sub
,LATERAL(SELECT score FROM score_history sc
WHERE sc.happened_at::date <= sub.date
AND sc.location_id = sub.location_id
ORDER BY happened_at DESC LIMIT 1) l
,LATERAL(SELECT MIN(happened_at::date) m1, MAX(happened_at::date) m2
FROM score_history sc
WHERE sc.location_id = sub.location_id) lm
WHERE sub.date BETWEEN lm.m1 AND lm.m2
ORDER BY location_id, date;
How it works:
1) s
(it is cross join of all dates per location_id)
2) l
(selecting score per location)
3) lm
(selecting min/max date per location for filtering)
4) WHERE
filter dates on range that is available, it could be relaxed if needed