Search code examples
mariadbstored-functions

Sum of differences between N pairs of values spread across multiple rows


I want to find the total amount of hours worked by an employee for a given date. The total hours would be the sum of the differences between two matching pairs of punch times. "Matching pair" here defined by one of PunchTypeID 1, and one of PunchTypeID 2. There could be N number of matching sets for a given date, and there could be unpaired matches (a punch in with no punch out). Unpaired matches should be discarded for this process.

The time clock punches of employees are stored in the following schema.

+-------------+----------+-------------+
|     PunchID |      INT | Primary Key |
|  EmployeeID |      INT | Foreign Key |
|   PunchTime | DATETIME |             |
| PunchTypeID |  TINYINT | Foreign Key |
+-------------+----------+-------------+

So given the following entries, I would like find some way to get an output of 07:30.

SELECT TIME(PunchTime), PunchTypeID FROM TimeTable; 
+-----------+-------------+
| PunchTime | PunchTypeID |
+-----------+-------------+
|     08:00 |           1 |
|     12:00 |           2 |
|     13:00 |           1 |
|     16:30 |           2 |
+-----------+-------------+

I thought a stored function would be best, but I can't seem to find any way to query from within a function, then perform operations based on that query.

I've also looked into using a window function, but couldn't make heads or tails of how to apply one here.

Ideally, I'd like to avoid having to do this with a proper programming language, because I'd like to be able to use this function (or whatever the answer is) in other queries. For example, finding the total hours worked in a month by department.

This is running on 10.6.4-MariaDB.


Solution

  • As an example, we use the following data:

    select EmployeeID, PunchTime, PunchTypeID from TimeTable;
    +------------+---------------------+-------------+
    | EmployeeID | PunchTime           | PunchTypeID |
    +------------+---------------------+-------------+
    |          1 | 2022-09-29 08:00:00 |           1 |
    |          1 | 2022-09-29 12:00:00 |           2 |
    |          1 | 2022-09-29 13:00:00 |           1 |
    |          1 | 2022-09-29 14:00:00 |           1 |
    |          1 | 2022-09-29 17:30:00 |           2 |
    +------------+---------------------+-------------+
    

    There is an entry at row 3, which should be ignored since there is no corresponding "punch out". So we need to rank the results using DESC_RANK for getting a group identifier:

    select PunchTime, PunchTypeID - dense_rank() over (order by PunchTime) as result from TimeTable where date(PunchTime)="2022-09-29" and EmployeeID=1;
    +---------------------+--------+
    | PunchTime           | result |
    +---------------------+--------+
    | 2022-09-29 08:00:00 |      0 |
    | 2022-09-29 12:00:00 |      0 |
    | 2022-09-29 13:00:00 |     -2 |
    | 2022-09-29 14:00:00 |     -3 |
    | 2022-09-29 17:30:00 |     -3 |
    +---------------------+--------+
    

    Now we need to calculate minimum and maximum values per group and substract their values. For groups with only 1 entry the difference will be always zero:

    select timediff(max(times.PunchTime), min(times.Punchtime)) from (select PunchTime, PunchTypeID - dense_rank() over (order by PunchTime) as result from TimeTable where date(PunchTime)="2022-09-29" and EmployeeID=1) as times group by result;
    +------------------------------------------------------+
    | timediff(max(times.PunchTime), min(times.Punchtime)) |
    +------------------------------------------------------+
    | 03:30:00                                             |
    | 00:00:00                                             |
    | 04:00:00                                             |
    +------------------------------------------------------+
    

    Finally to avoid manual addition of time values we have to use another subquery:

    select sec_to_time(sum(time_to_sec(total.timediff))) as working_hours from (select timediff(max(times.PunchTime), min(times.PunchTime)) as timediff from    (select PunchTime, PunchTypeID - dense_rank() over (order by PunchTime) as result from TimeTable where date(PunchTime)="2022-09-29" and EmployeeID=1) as times group by times.result) as total;
    +---------------+
    | working_hours |
    +---------------+
    | 07:30:00      |
    +---------------+