Search code examples
datetimemariadbquery-optimizationgreatest-n-per-groupmariadb-10.3

In MariaDB, how to select the last event in every day and create indexes for it?


I have a table of meetings, each of which has a start and end time expressed as an integer Unix timestamp, as well as a group of people who are attending this meeting. I need to send out notifications to each group of people once all their meetings for the day have completed. To do this I need to find the last meeting for every combination of (day, group). I've adapted this answer for my situation, and it works:

SELECT MAX(`starts_at`), `id`, `group_id`, DATE(FROM_UNIXTIME(`starts_at`)) `day`
FROM `meeting`
GROUP BY `day`, `group_id`;

However, despite trying out different combinations of indexes, I can't seem to find one that would make this query not perform a full-table scan. The EXPLAIN result is always as follows:

+------+-------------+---------+------+---------------+------+---------+------+------+---------------------------------+
| id   | select_type | table   | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
+------+-------------+---------+------+---------------+------+---------+------+------+---------------------------------+
|    1 | SIMPLE      | meeting | ALL  | NULL          | NULL | NULL    | NULL |   16 | Using temporary; Using filesort |
+------+-------------+---------+------+---------------+------+---------+------+------+---------------------------------+

My table is defined this way:

CREATE TABLE `meeting` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `group_id` int(11) NOT NULL,
  `starts_at` int(11) NOT NULL,
  `ends_at` int(11) NOT NULL,
  ... other fields ..., 
  PRIMARY KEY (`id`),
  CONSTRAINT `meeting_ibfk_1` FOREIGN KEY (`group_id`) REFERENCES `group` (`id`),
)

What combination of indexes, and what query, would I need here? One of the solutions I see is to create an indexed column that stores the event's day, perhaps as some sort of ordinal, and while this is an option I'd prefer to avoid it if possible in order to not have redundant values in a single row.


Solution

  • If you are running MySQL 8.0.13 or higher, you could try an index on expressions:

    create index idx_meeting on meeting(
        group_id, 
        (date(from_unixtime(starts_at))), 
        starts_at desc
    );
    

    This puts first the columns / expressions that appear in the group by clause, then the column that is aggregated; since we will be looking for the max(), we want to sort that column in descending order in the index.

    In this DB Fiddle, it looks like the index is picked up by the database (but note that I have no data to play with - the results may be different in your environment:

    id select_type table partitions type possible_keys key key_len ref rows filtered Extra
    1 SIMPLE meeting null index idx_meeting idx_meeting 12 null 1 100.00 Using index; Using temporary

    A further option would be to simplify the query a little. We could arithmetics rather than timestamp conversion. We can easily put an index on this - or use a computed column if your version does not support indexes on expression:

    alter table meeting 
        add starts_date date as (floor(starts_at / 60 / 60 / 24))
        stored;
    
    create index idx_meeting2 on meeting(group_id, starts_date, starts_at desc);
    

    Then we run the query as follows:

    SELECT MAX(starts_at), group_id, DATE(MAX(starts_at)) day
    FROM meeting
    GROUP BY group_id, starts_day;
    

    Explain in Maria DB 10.3:

    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE meeting index null idx_meeting2 12 null 1 Using index

    Edit: you want the latest event per group and per day. Aggregation is not appropriate. Instead, we need to filter.

    Consider the following set up:

    alter table meeting 
        add starts_day int(11) as (floor(starts_at / 60 / 60 / 24) * 60 * 60 * 24)
        stored;
    create index idx_meeting on meeting(group_id, starts_day, starts_at desc);
    

    Now we can use the following query:

    select starts_at, group_id, date(from_unixtime(starts_day)) day
    from meeting m
    where starts_at = (
        select max(m1.starts_at)
        from meeting m1
        where m1.group_id = m.group_id and m1.starts_day = m.starts_day
    )
    

    The subquery takes advantage of the index, that may also partially be used in the outer query.

    Demo on DB Fiddle