Search code examples
mysqlsqltemporal-database

Query to find active records on a given date in a table recording "start" and "stop" dates


I have a list of "start / stop" activity logged in a table, each one associated with a date. I need to determine which users had "started" on a particular date - i.e. were in progress with the task. My current setup and query can be represented by this simplistic view of it:

CREATE TABLE `registration_statuses` (
    `status_id` INT(11) NOT NULL AUTO_INCREMENT,
    `status_user_id` INT(10) UNSIGNED NOT NULL DEFAULT '0',
    `status_activity` ENUM('start','stop') DEFAULT 'start',
    `status_date` DATE NULL DEFAULT NULL,
    PRIMARY KEY (`status_id`),
    INDEX `status_user_id` (`status_user_id`)
);

INSERT INTO `registration_statuses` (`status_user_id`, `status_activity`, `status_date`)
VALUES (1, 'start', '2020-01-01'),
       (2, 'start', '2020-01-02'),
       (1, 'stop', '2020-01-19'),
       (1, 'start', '2020-01-25'),
       (2, 'stop', '2020-01-31'),
       (1, 'stop', '2020-01-31');

I am then running this query:

SELECT `rs`.`status_user_id`
FROM `registration_statuses` `rs`
  INNER JOIN (
    SELECT `status_user_id`, MAX(status_date) `last_date`
    FROM `registration_statuses`
    WHERE `status_date` < '2020-01-03'
    GROUP BY `status_user_id`
  ) `srs` ON `rs`.`status_user_id` = `srs`.`status_user_id`
            AND `rs`.`status_date` = `srs`.`last_date`
WHERE `status_activity` = 'start';

(See http://sqlfiddle.com/#!9/c8d371/1/0)

By changing the date in the query, this query returns a list of user ids that tell me who is engaged (i.e. has started a task) on that specific date. However, the users are considered (in real life) to have been engaged with the task on the actual date that they stop it. This query doesn't allow for this in that if you were to change the date in the query to reflect 2020-01-19, the day on which user 1 stopped, the query would only return user 2.

I tried changing the <= condition to a strict <, and while this solves that part of the problem, users are not considered busy on the day that they start. With a strict <, only user is returned on '2019-01-25' whereas I want both users to appear.

My only "viable" solution, at this point, is to merge the results of the two versions of the queries (in the form of a DISTINCT / UNION query), but I can't help but think that there must be a more efficient way of obtaining the results I need.


Solution

  • Does this help?

    SELECT a.status_id
         , a.status_user_id 
         , a.status_date start
         , MIN(b.status_date) stop
      FROM registration_statuses a
      LEFT
      JOIN registration_statuses b
        ON b.status_user_id = a.status_user_id
       AND b.status_id > a.status_id
       AND b.status_activity = 'stop'
     WHERE a.status_activity = 'start'
     GROUP 
        BY a.status_id;
    
    +-----------+----------------+------------+------------+
    | status_id | status_user_id | start      | stop       |
    +-----------+----------------+------------+------------+
    |         1 |              1 | 2020-01-01 | 2020-01-19 |
    |         2 |              2 | 2020-01-02 | 2020-01-31 |
    |         4 |              1 | 2020-01-25 | 2020-01-31 |
    +-----------+----------------+------------+------------+