I have the following two tables in a PostgreSQL database:
dummy=# select * from employee;
id | name
----+-------
1 | John
2 | Susan
3 | Jim
4 | Sarah
(4 rows)
dummy=# select * from stats;
id | arrival | day | employee_id
----+----------+------------+-------------
2 | 08:31:34 | monday | 2
4 | 08:15:00 | monday | 3
5 | 08:43:00 | monday | 4
1 | 08:34:00 | monday | 1
7 | 08:29:00 | midweek | 1
8 | 08:31:00 | midweek | 2
9 | 08:10:00 | midweek | 3
10 | 08:40:00 | midweek | 4
11 | 08:28:00 | midweek | 1
12 | 08:33:00 | midweek | 2
14 | 08:21:00 | midweek | 3
15 | 08:45:00 | midweek | 4
16 | 08:25:00 | midweek | 1
17 | 08:35:00 | midweek | 2
18 | 08:44:00 | midweek | 4
19 | 08:10:00 | friday | 1
20 | 08:40:00 | friday | 2
21 | 08:30:00 | friday | 3
22 | 08:30:00 | friday | 4
(19 rows)
I want to select all employees that arrive between 8:25
and 8:35
on midweek
and friday
. I can accomplish that relatively easy with the following query:
SELECT * FROM stats
WHERE
arrival >= (time '8:30' - interval '5 minutes')
AND
arrival <= (time '8:30' + interval '5 minutes')
AND
(day = 'midweek' or day = 'friday');
However, an additional criterion is that I only want to select those employees that arrive at least 60% of the time within the aforementioned time window. This is where I am stuck. I do not know how to calculate that ratio.
What does the Query look like which fulfills all the criteria?
CLARIFICATION
Apparently the above description for the ratio is misleading.
When calculating the ratio then only the rows that meet the criteria (day = 'midweek' or day = 'friday')
shall be considered. So in the sample data John and Susan show up four times for work on midweek
and friday
. Three out of those four times they are punctual. Hence, the ratio for Susan and John is 75%
.
Use a common table expression to calculate needed counts, e.g.
with in_time as (
select *
from stats
where arrival >= (time '8:30' - interval '5 minutes')
and arrival <= (time '8:30' + interval '5 minutes')
and (day = 'midweek' or day = 'friday')
),
count_in_time as (
select employee_id, count(*)
from in_time
group by employee_id
),
total_count as (
select employee_id, count(*)
from stats
where day = 'midweek' or day = 'friday'
group by employee_id
)
select
i.*,
c.count as in_time,
t.count as total_count,
round(c.count* 100.0/t.count, 2) as ratio
from in_time i
join count_in_time c using(employee_id)
join total_count t using(employee_id);
Results:
id | arrival | day | employee_id | in_time | total_count | ratio
----+----------+---------+-------------+---------+-------------+-------
16 | 08:25:00 | midweek | 1 | 3 | 4 | 75.00
11 | 08:28:00 | midweek | 1 | 3 | 4 | 75.00
7 | 08:29:00 | midweek | 1 | 3 | 4 | 75.00
17 | 08:35:00 | midweek | 2 | 3 | 4 | 75.00
12 | 08:33:00 | midweek | 2 | 3 | 4 | 75.00
8 | 08:31:00 | midweek | 2 | 3 | 4 | 75.00
21 | 08:30:00 | friday | 3 | 1 | 3 | 33.33
22 | 08:30:00 | friday | 4 | 1 | 4 | 25.00
(8 rows)
You can add an appropriate condition in the WHERE clause of the final query.
If you want to get aggregated data only with employees and their ratios, use count() with filter:
select employee_id, name, in_time* 1.0/ total as ratio
from (
select
employee_id,
count(*) filter (where arrival >= time '8:30' - interval '5 minutes' and arrival <= time '8:30' + interval '5 minutes') as in_time,
count(*) as total
from stats
where day in ('midweek', 'friday')
group by 1
) s
join employee e on e.id = s.employee_id
where in_time* 1.0/ total >= 0.6;
employee_id | name | ratio
-------------+-------+------------------------
1 | John | 0.75000000000000000000
2 | Susan | 0.75000000000000000000
(2 rows)