I have a table of meetings, each of which has a start and end time expressed as an integer Unix timestamp, as well as a group of people who are attending this meeting. I need to send out notifications to each group of people once all their meetings for the day have completed. To do this I need to find the last meeting for every combination of (day, group). I've adapted this answer for my situation, and it works:
SELECT MAX(`starts_at`), `id`, `group_id`, DATE(FROM_UNIXTIME(`starts_at`)) `day`
FROM `meeting`
GROUP BY `day`, `group_id`;
However, despite trying out different combinations of indexes, I can't seem to find one that would make this query not perform a full-table scan. The EXPLAIN
result is always as follows:
+------+-------------+---------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------+------+---------------+------+---------+------+------+---------------------------------+
| 1 | SIMPLE | meeting | ALL | NULL | NULL | NULL | NULL | 16 | Using temporary; Using filesort |
+------+-------------+---------+------+---------------+------+---------+------+------+---------------------------------+
My table is defined this way:
CREATE TABLE `meeting` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`group_id` int(11) NOT NULL,
`starts_at` int(11) NOT NULL,
`ends_at` int(11) NOT NULL,
... other fields ...,
PRIMARY KEY (`id`),
CONSTRAINT `meeting_ibfk_1` FOREIGN KEY (`group_id`) REFERENCES `group` (`id`),
)
What combination of indexes, and what query, would I need here? One of the solutions I see is to create an indexed column that stores the event's day, perhaps as some sort of ordinal, and while this is an option I'd prefer to avoid it if possible in order to not have redundant values in a single row.
If you are running MySQL 8.0.13 or higher, you could try an index on expressions:
create index idx_meeting on meeting(
group_id,
(date(from_unixtime(starts_at))),
starts_at desc
);
This puts first the columns / expressions that appear in the group by
clause, then the column that is aggregated; since we will be looking for the max()
, we want to sort that column in descending order in the index.
In this DB Fiddle, it looks like the index is picked up by the database (but note that I have no data to play with - the results may be different in your environment:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | meeting | null | index | idx_meeting | idx_meeting | 12 | null | 1 | 100.00 | Using index; Using temporary |
A further option would be to simplify the query a little. We could arithmetics rather than timestamp conversion. We can easily put an index on this - or use a computed column if your version does not support indexes on expression:
alter table meeting
add starts_date date as (floor(starts_at / 60 / 60 / 24))
stored;
create index idx_meeting2 on meeting(group_id, starts_date, starts_at desc);
Then we run the query as follows:
SELECT MAX(starts_at), group_id, DATE(MAX(starts_at)) day
FROM meeting
GROUP BY group_id, starts_day;
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | meeting | index | null | idx_meeting2 | 12 null | 1 | Using index |
Edit: you want the latest event per group and per day. Aggregation is not appropriate. Instead, we need to filter.
Consider the following set up:
alter table meeting
add starts_day int(11) as (floor(starts_at / 60 / 60 / 24) * 60 * 60 * 24)
stored;
create index idx_meeting on meeting(group_id, starts_day, starts_at desc);
Now we can use the following query:
select starts_at, group_id, date(from_unixtime(starts_day)) day
from meeting m
where starts_at = (
select max(m1.starts_at)
from meeting m1
where m1.group_id = m.group_id and m1.starts_day = m.starts_day
)
The subquery takes advantage of the index, that may also partially be used in the outer query.