Search code examples
sqlpostgresqljoinsubquerycase

PostgreSQL: Replace value of one table by mean value of another table when criteria is met


Imagine the following two CTEs:

  • table_a shows the number of exempt students and the mean exam score (excluding exempt students) per category, per month.

enter image description here

  • table_b shows a breakdown of each student information (using the month of January only as an example). Note that output in table_a is obtained through information of table_b.

enter image description here

As exempt students were under mitigating circumstances, their exam scores won't be considered. Instead, I would like to assign to them the mean score of all students that were examined as shown below:

enter image description here

What is the best way of achieving this? I'm struggling to see how to join the tables when for one of the table the timestamp is actually a truncated date as it provides grouped information.

with table_a (exam_month, category, "# exempt students", "mean score (excl. exempt students")
as (values 
('2022-01-01 12:00:00'::timestamp, 'A', 2, 61.8),
('2022-02-01 12:00:00'::timestamp, 'A', 1, 63.1),
('2022-03-01 12:00:00'::timestamp, 'A', 2, 61.1),
('2022-01-01 12:00:00'::timestamp, 'B', 4, 73.8),
('2022-02-01 12:00:00'::timestamp, 'B', 3, 71.3),
('2022-03-01 12:00:00'::timestamp, 'B', 2, 72.5)
),

table_b (student_id, exam_timestamp, category, score, student_status)
as (values
(1, '2022-01-10 09:30:00'::timestamp, 'A', 13.1, 'exempt'),
(2, '2022-01-12 12:00:00'::timestamp, 'A', 63.7, 'examined'),
(3, '2022-01-14 09:00:00'::timestamp, 'A', 59.2, 'examined'),
(4, '2022-01-16 14:30:00'::timestamp, 'A', 62.8, 'examined'),
(5, '2022-01-18 12:30:00'::timestamp, 'A', 41.2, 'exempt'),
(6, '2022-01-20 12:00:00'::timestamp, 'A', 61.5, 'examined')
)

Solution

  • You can join by using date_trunc() and by casting to date:

    with table_a (exam_month, category, "# exempt students", "mean score (excl. exempt students")
    as (
     values 
      ('2022-01-01 12:00:00'::timestamp, 'A', 2, 61.8),
      ('2022-02-01 12:00:00'::timestamp, 'A', 1, 63.1),
      ('2022-03-01 12:00:00'::timestamp, 'A', 2, 61.1),
      ('2022-01-01 12:00:00'::timestamp, 'B', 4, 73.8),
      ('2022-02-01 12:00:00'::timestamp, 'B', 3, 71.3),
      ('2022-03-01 12:00:00'::timestamp, 'B', 2, 72.5)
    ), table_b (student_id, exam_timestamp, category, score, student_status)
    as (
     values
      (1, '2022-01-10 09:30:00'::timestamp, 'A', 13.1, 'exempt'),
      (2, '2022-01-12 12:00:00'::timestamp, 'A', 63.7, 'examined'),
      (3, '2022-01-14 09:00:00'::timestamp, 'A', 59.2, 'examined'),
      (4, '2022-01-16 14:30:00'::timestamp, 'A', 62.8, 'examined'),
      (5, '2022-01-18 12:30:00'::timestamp, 'A', 41.2, 'exempt'),
      (6, '2022-01-20 12:00:00'::timestamp, 'A', 61.5, 'examined')
    )
    select b.student_id, b.exam_timestamp, b.category, 
           case 
             when b.student_status = 'exempt' then a."mean score (excl. exempt students"
             else b.score
           end as score,
           b.student_status
      from table_b b
           join table_a a
             on a.category = b.category
            and date_trunc('month', b.exam_timestamp)::date = a.exam_month::date;
    

    All that said, you do not need table_a for this calculation:

    with table_b (student_id, exam_timestamp, category, score, student_status)
    as (
     values
      (1, '2022-01-10 09:30:00'::timestamp, 'A', 13.1, 'exempt'),
      (2, '2022-01-12 12:00:00'::timestamp, 'A', 63.7, 'examined'),
      (3, '2022-01-14 09:00:00'::timestamp, 'A', 59.2, 'examined'),
      (4, '2022-01-16 14:30:00'::timestamp, 'A', 62.8, 'examined'),
      (5, '2022-01-18 12:30:00'::timestamp, 'A', 41.2, 'exempt'),
      (6, '2022-01-20 12:00:00'::timestamp, 'A', 61.5, 'examined')
    )
    select b.student_id, b.exam_timestamp, b.category, 
           (case 
             when b.student_status = 'exempt' 
               then sum(case
                          when b.student_status = 'exempt' then 0
                          else b.score
                        end) over w
                    / 
                    sum((b.student_status = 'examined')::int) over w
             else b.score
           end)::numeric(4,1) as score,
           b.student_status
      from table_b b
    window w as (partition by date_trunc('month', b.exam_timestamp), category)
    ;
    

    db<>fiddle here