Search code examples
mysqlsqldistinct

MySQL Query to collect unique IDs that are X seconds more than selected ID


I have a table with user IDS and a timestamp. I am looking to collect the distinct userids for the given day (2021-09-03 in this case). However if the date and time of the userid is greater than X seconds then I want to print this userid as well (as I know this is not a duplicate).

In other words a user enters the system and it logs him in (this is a true login), however it may log him in several times in the database, several rows of repeat data where the seconds or milliseconds differ (as a result of a number of legacy systems). He may log in again 10 seconds later (or any time frame later) - this is another true login (but might also contain several rows of login again). Therefore I want to show the userid and date and time for only the true logins.

rowid   userid  dateandtime
10      0       2021-09-03 18:48:44.517771
11      0       2021-09-03 18:48:44.873727
12      0       2021-09-03 18:48:44.936010
13      0       2021-09-03 18:48:45.158109
14      0       2021-09-03 18:48:46.385015
15      10      2021-09-03 18:49:00.433325
16      0       2021-09-03 20:03:35.584817
17      0       2021-09-03 20:03:35.633030
18      0       2021-09-03 20:03:36.679559
19      0       2021-09-03 20:03:37.726138
20      4357    2021-09-03 22:41:43.549345
21      4357    2021-09-03 22:41:44.127240
22      4357    2021-09-03 22:41:44.168634
23      4357    2021-09-03 22:41:44.454468

So you can see userid 0 logged in at 2021-09-03 18:48:44.517771. The other legacy systems continued to log him in for several seconds and milliseconds (these are the false logins). Then user 10 logged in at 2021-09-03 18:49:00.433325 and user 0 legitimately logged in again at 2021-09-03 20:03:35.584817 (and false logins) followed by user 4357 at 2021-09-03 22:41:43.549345 (and then false logins again for this user).

I need to create a MySQL query to clean out the noise and just result with the true logins:

rowid   userid  dateandtime
10      0       2021-09-03 18:48:44.517771
15      10      2021-09-03 18:49:00.433325
16      0       2021-09-03 20:03:35.584817
20      4357    2021-09-03 22:41:43.549345

So distinct logins where the date and time is greater than X seconds for the given user.

Thank you.


Solution

  • You can compare the time in seconds with the previous login by the same user using the TIMESTAMPDIFF and LAG functions.

    SET @X = 1; -- X seconds
    
    WITH t1 AS (
      SELECT *,
        TIMESTAMPDIFF(
          second, 
          LAG(dateandtime) OVER (PARTITION BY userid ORDER BY dateandtime), 
          dateandtime
        ) AS seconds
      FROM t
    ) 
    SELECT rowid, userid, dateandtime
    FROM t1
    WHERE seconds IS NULL OR seconds > @X
    ORDER BY dateandtime
    

    db<>fiddle