Search code examples
sqlpostgresqlgroup-bymaxwindow-functions

Count repeating consecutive events as groups in postgresql


I have a PostgreSQL dataset with employee name, entry id and entry times. These are repeated across several rows depending on how many times an employee clocked in. What I'm trying to do is group consecutive entries for each employee and count the number of entries.

Input

entry_id    emp_name        entry_time
100         John Doe        18/10/2021
101         Mark Foo        18/10/2021
102         Angie Genie     19/10/2021
103         Angie Genie     19/10/2021
104         Angie Genie     19/10/2021
105         John Doe        20/10/2021
106         John Doe        20/10/2021
107         Angie Genie     21/10/2021
108         Angie Genie     21/10/2021
109         Mark Foo        22/10/2021
110         Mark Foo        22/10/2021
111         Mark Foo        23/10/2021
112         Mark Foo        24/10/2021

Desired Output

emp_name            last entry_time     no of entries
John Doe            18/10/2021          1
Mark Foo            18/10/2021          1
Angie Genie         19/10/2021          3
John Doe            20/10/2021          2
Angie Genie         21/10/2021          2
Mark Foo            24/10/2021          4

I have tried using gaps and island approach to assign consecutive emp_names into groups to enable me obtain the entry counts but I have not been able to do so. I was hoping to assign groups like below but nothing is working so far.

entry_id        emp_name            entry_time          group       no of entries
100             John Doe            18/10/2021          1           1
101             Mark Foo            18/10/2021          2           1
102             Angie Genie         19/10/2021          3           3
103             Angie Genie         19/10/2021          3           3
104             Angie Genie         19/10/2021          3           3
105             John Doe            20/10/2021          4           2
106             John Doe            20/10/2021          4           2
107             Angie Genie         21/10/2021          5           2
108             Angie Genie         21/10/2021          5           2
109             Mark Foo            22/10/2021          6           4
110             Mark Foo            22/10/2021          6           4
111             Mark Foo            23/10/2021          6           4
112             Mark Foo            24/10/2021          6           4

Is this possible in PostgreSQl? Any help is appreciated. Thank you.


Solution

  • You can use the window function LAG() to check for the previous emp_name of each row and the window function SUM() to create the groups of consecutive repeating emp_names.
    Finally aggregate by each group:

    SELECT emp_name, 
           MAX(entry_time) last_entry_time,
           COUNT(*) no_of_entries 
    FROM (
      SELECT *, SUM(flag) OVER (ORDER BY entry_time, entry_id) grp
      FROM (
        SELECT *, (emp_name <> LAG(emp_name, 1, '') OVER (ORDER BY entry_time, entry_id))::int flag
        FROM tablename
      ) t
    ) t
    GROUP BY grp, emp_name
    ORDER BY grp;
    

    See the demo.