Search code examples
mysqlsqlmysql-5.6

SQL average time between first and second row in a set


Ouch! My heard hurts .. thought I had it a few times but epic fail :(

I have the following data, millions of rows, indexed, MySQL 5.6.

In this table, there are sets of data, and the uuid is basically a unique id for each set of data.

I need to find the AVERAGE in the data for each 1st and 2nd rows in each set. In other words, how much time past since the set was created with the first insert and the second insert for that same set and then the average of the results.

I can get the average no problem, I just can't seem to get my head around a way to just get the time difference between 1st and 2nd row in each set.

I'm not even going to embarrass myself and paste my broken SQL with my misguided attempts using sub-queries and LIMIT, suffice to say, this one escapes me.

Any help appreciated, beers on me :/

+------+-----------------------------------------+----------------------------+ | id | uuid | stamp | +------+-----------------------------------------+----------------------------+ | 707 | 60b5-d062-5829-c11d-5b71-5d85-075b-a3c5 | 2020-01-01 17:00:28.000000 | | 708 | 60b5-d062-5829-c11d-5b71-5d85-075b-a3c5 | 2020-01-01 17:01:30.000000 | | 709 | 0ccf-94e0-ce72-8092-1975-5bea-6131-c719 | 2020-01-02 14:11:48.000000 | | 710 | 59c8-60ee-d172-511a-a477-c637-6789-f14a | 2020-01-02 14:23:36.000000 | | 711 | b33b-7584-1fed-e138-28ba-c24a-9b46-88e7 | 2020-01-02 14:24:07.000000 | | 712 | eddc-b12a-5ef2-baea-cf53-7287-5805-d922 | 2020-01-02 14:24:26.000000 | | 713 | 257b-fc66-6d7a-ba21-727e-1da7-0ee1-714c | 2020-01-02 14:25:31.000000 | | 718 | c5d9-acba-9a12-aacb-cf45-c5a9-2b8d-314c | 2020-01-02 15:46:41.000000 | | 719 | 0ccf-94e0-ce72-8092-1975-5bea-6131-c719 | 2020-01-02 15:55:42.000000 | | 720 | c5d9-acba-9a12-aacb-cf45-c5a9-2b8d-314c | 2020-01-02 15:56:33.000000 | | 722 | c5d9-acba-9a12-aacb-cf45-c5a9-2b8d-314c | 2020-01-02 16:16:14.000000 | | 723 | c5d9-acba-9a12-aacb-cf45-c5a9-2b8d-314c | 2020-01-02 16:21:25.000000 | | 726 | 6610-a9df-358d-0065-beb8-cea1-82a6-3258 | 2020-01-02 17:16:33.000000 | | 727 | 6610-a9df-358d-0065-beb8-cea1-82a6-3258 | 2020-01-02 17:21:20.000000 | | 728 | 6610-a9df-358d-0065-beb8-cea1-82a6-3258 | 2020-01-02 17:45:07.000000 | | 729 | 6610-a9df-358d-0065-beb8-cea1-82a6-3258 | 2020-01-02 17:50:17.000000 | | 730 | 6610-a9df-358d-0065-beb8-cea1-82a6-3258 | 2020-01-02 18:14:02.000000 | | 731 | 6610-a9df-358d-0065-beb8-cea1-82a6-3258 | 2020-01-02 18:27:48.000000 | | 732 | 6610-a9df-358d-0065-beb8-cea1-82a6-3258 | 2020-01-02 18:28:57.000000 | | 733 | c193-a46f-1104-3ee3-7387-94a8-ef32-a85e | 2020-01-02 18:40:40.000000 | | 734 | c193-a46f-1104-3ee3-7387-94a8-ef32-a85e | 2020-01-02 18:40:49.000000 |


Solution

  • If a user id only appears twice, then this is trivial. You have millions of rows, so let's try to avoid a sort and assume you have the right indexes.

    Here is one way to get the earliest two rows:

    select t.*
    from t
    where t.stamp <= (select t2.stamp
                      from t t2
                      where t2.uuid = t.uuid
                      order by t2.stamp asc
                      limit 1,1
                     );
    

    Very important: You want an index on (uuid, stamp) for any hope of performance.

    Then, just aggregate:

    select uuid, timestampdiff(second, min(stamp), max(stamp))
    from (select t.*
          from t
          where t.stamp <= (select t2.stamp
                            from t t2
                            where t2.uuid = t.uuid
                            order by t2.stamp asc
                            limit 1,1
                           )
         ) t
    group by uuid;