Situation:
Imagine some script that, upon completion, inserts the current timestamp and some other data into a MySQL table. It's executed every thirty minutes, sometimes not at all, leaving gaps in the data.
Goal:
Have a query that fetches all data with the timestamp rounded to the closest half hour and empty rows (all fields except for the timestamp should be null) when there's no data.
Restrictions:
Neither the table structure, the data itself nor the script can be altered.
Problem:
The only solution I could come up with that yields the desired result doesn't scale. Currently the actual table counts about 50'000 rows and it already takes more than fifteen minutes to complete the query.
Example:
CREATE TABLE IF NOT EXISTS `statuses` (
`timestamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
`status` INT NOT NULL DEFAULT '0',
PRIMARY KEY (`timestamp`)
);
INSERT
IGNORE INTO `statuses` (`timestamp`, `status`)
VALUES
('2023-01-01 00:03:34', '164850'),
('2023-01-01 00:31:23', '794088'),
('2023-01-01 03:31:28', '686754'),
('2023-01-01 04:01:15', '684711'),
('2023-01-01 05:31:35', '116777'),
('2023-01-01 06:01:52', '469332'),
('2023-01-01 06:31:55', '816300'),
('2023-01-01 08:33:53', '309583'),
('2023-01-01 09:03:54', '847976'),
('2023-01-01 09:31:33', '812517');
WITH RECURSIVE `timestamps`(`timestamp`) AS (
SELECT
(
SELECT
FROM_UNIXTIME(
UNIX_TIMESTAMP(MIN(`timestamp`)) - MOD(UNIX_TIMESTAMP(MIN(`timestamp`)), 1800)
)
FROM
`statuses`
)
UNION
ALL
SELECT
DATE_ADD(`timestamp`, INTERVAL 30 MINUTE)
FROM
`timestamps`
WHERE
`timestamp` < (
SELECT
FROM_UNIXTIME(
UNIX_TIMESTAMP(MAX(`timestamp`)) - MOD(UNIX_TIMESTAMP(MAX(`timestamp`)), 1800)
)
FROM
`statuses`
)
)
SELECT
`t`.`timestamp`,
`s`.`status`
FROM
`timestamps` AS `t`
LEFT OUTER JOIN `statuses` AS `s` ON `t`.`timestamp` = FROM_UNIXTIME(
UNIX_TIMESTAMP(`s`.`timestamp`) - MOD(UNIX_TIMESTAMP(`s`.`timestamp`), 1800)
)
ORDER BY
`t`.`timestamp` ASC;
We can simplify the recursive part a little. As Thorsten Kettner mentions, there is no need to reselect the max status date on each iteration, we can do that in the anchor - and I would also argue that we don't even need to round the max date (although that's micro-optimization).
When it comes to the outer query, I would not recommend applying functions on the status timestamp; this is the primary key of the table, against which we do want to run a SARGEable predicate: let's use a half-open interval instead (and we don't need to perform the unixtime conversion again here):
with recursive timestamps (ts, max_ts) as (
select from_unixtime(floor(unix_timestamp(min(ts)) / 1800) * 1800) ts, max(ts) max_ts
from statuses
union all
select ts + interval 30 minute, max_ts from timestamps where ts + interval 30 minute <= max_ts
)
select t.ts, s.status
from timestamps t
left join statuses s on s.ts >= t.ts and s.ts < t.ts + interval 30 minute
order by t.ts
Note that I renamed column timestamp
to ts
, to avoid clashing with the corresponding SQL keyword.
If that's not good enough, then one alternative would be to materialize the results of the recursive query in a calendar table.
You would typically cover a large period of time with the table. You can use the recursive query to create it:
create table timestamp_calendar as
with recursive timestamps as (
select '2022-01-01 00:00:00' ts, '2022-01-02 12:00:00' max_ts -- short period for testing
union all
select ts + interval 30 minute, max_ts
from timestamps
where ts < max_ts
)
select * from timestamps;
We can declare a primary key to benefit the underlying index:
alter table timestamp_calendar add primary key (ts);
Then we can use the table in our query. Ideally, you know in advance which date range you are looking for. But if you don't, we can bring the min/max status dates and use them to pre-filter the calendar table.
select t.ts, s.status
from timestamp_calendar t
inner join (select min(ts) min_ts, max(ts) max_ts from statuses) x
on t.ts > x.min_ts - interval 30 minute
and t.ts <= x.max_ts
left join statuses s
on s.ts >= t.ts
and s.ts < t.ts + interval 30 minute
order by t.ts
Here is a demo on DB Fiddle