Search code examples
databasepostgresqlaggregate-functions

Postgres: Calculate ratio of table entries that match condition


I have the following two tables in a PostgreSQL database:

dummy=# select * from employee;
 id | name  
----+-------
  1 | John
  2 | Susan
  3 | Jim
  4 | Sarah
(4 rows)

dummy=# select * from stats;
 id | arrival  |    day     | employee_id 
----+----------+------------+-------------
  2 | 08:31:34 | monday     |           2
  4 | 08:15:00 | monday     |           3
  5 | 08:43:00 | monday     |           4
  1 | 08:34:00 | monday     |           1
  7 | 08:29:00 | midweek    |           1
  8 | 08:31:00 | midweek    |           2
  9 | 08:10:00 | midweek    |           3
 10 | 08:40:00 | midweek    |           4
 11 | 08:28:00 | midweek    |           1
 12 | 08:33:00 | midweek    |           2
 14 | 08:21:00 | midweek    |           3
 15 | 08:45:00 | midweek    |           4
 16 | 08:25:00 | midweek    |           1
 17 | 08:35:00 | midweek    |           2
 18 | 08:44:00 | midweek    |           4
 19 | 08:10:00 | friday     |           1
 20 | 08:40:00 | friday     |           2
 21 | 08:30:00 | friday     |           3
 22 | 08:30:00 | friday     |           4
(19 rows)

I want to select all employees that arrive between 8:25 and 8:35 on midweek and friday. I can accomplish that relatively easy with the following query:

SELECT * FROM stats
WHERE
    arrival >= (time '8:30' - interval '5 minutes')
AND
    arrival <= (time '8:30' + interval '5 minutes')
AND
    (day = 'midweek' or day = 'friday');

However, an additional criterion is that I only want to select those employees that arrive at least 60% of the time within the aforementioned time window. This is where I am stuck. I do not know how to calculate that ratio.

What does the Query look like which fulfills all the criteria?

CLARIFICATION

Apparently the above description for the ratio is misleading. When calculating the ratio then only the rows that meet the criteria (day = 'midweek' or day = 'friday') shall be considered. So in the sample data John and Susan show up four times for work on midweek and friday. Three out of those four times they are punctual. Hence, the ratio for Susan and John is 75%.


Solution

  • Use a common table expression to calculate needed counts, e.g.

    with in_time as (
        select * 
        from stats
        where arrival >= (time '8:30' - interval '5 minutes')
        and arrival <= (time '8:30' + interval '5 minutes')
        and (day = 'midweek' or day = 'friday')
    ),
    count_in_time as (
        select employee_id, count(*)
        from in_time
        group by employee_id
    ),
    total_count as (
        select employee_id, count(*)
        from stats
        where day = 'midweek' or day = 'friday'
        group by employee_id
    )
    select 
        i.*, 
        c.count as in_time, 
        t.count as total_count, 
        round(c.count* 100.0/t.count, 2) as ratio
    from in_time i
    join count_in_time c using(employee_id) 
    join total_count t using(employee_id);
    

    Results:

     id | arrival  |   day   | employee_id | in_time | total_count | ratio 
    ----+----------+---------+-------------+---------+-------------+-------
     16 | 08:25:00 | midweek |           1 |       3 |           4 | 75.00
     11 | 08:28:00 | midweek |           1 |       3 |           4 | 75.00
      7 | 08:29:00 | midweek |           1 |       3 |           4 | 75.00
     17 | 08:35:00 | midweek |           2 |       3 |           4 | 75.00
     12 | 08:33:00 | midweek |           2 |       3 |           4 | 75.00
      8 | 08:31:00 | midweek |           2 |       3 |           4 | 75.00
     21 | 08:30:00 | friday  |           3 |       1 |           3 | 33.33
     22 | 08:30:00 | friday  |           4 |       1 |           4 | 25.00
    (8 rows)
    

    You can add an appropriate condition in the WHERE clause of the final query.

    If you want to get aggregated data only with employees and their ratios, use count() with filter:

    select employee_id, name, in_time* 1.0/ total as ratio
    from (
        select 
            employee_id,
            count(*) filter (where arrival >= time '8:30' - interval '5 minutes' and arrival <= time '8:30' + interval '5 minutes') as in_time,
            count(*) as total
        from stats
        where day in ('midweek', 'friday')
        group by 1
        ) s
    join employee e on e.id = s.employee_id
    where in_time* 1.0/ total >= 0.6;
    
     employee_id | name  |         ratio          
    -------------+-------+------------------------
               1 | John  | 0.75000000000000000000
               2 | Susan | 0.75000000000000000000
    (2 rows)