Search code examples
sqlsql-serveraverageaggregate-functionsgreatest-n-per-group

Calculate query with some special case


I have a table

  • Exam (studentID, subjectCode, examDate, examScore)

and a lot of values like

INSERT [dbo].[Exam] ([studentID], [subjectCode], [examDate], [examScore]) VALUES (N'ST01', N'PRJ', CAST(0x253C0B00 AS Date), 5)
INSERT [dbo].[Exam] ([studentID], [subjectCode], [examDate], [examScore]) VALUES (N'ST01', N'WIG', CAST(0x253C0B00 AS Date), 1)
 . . .
INSERT [dbo].[Exam] ([studentID], [subjectCode], [examDate], [examScore]) VALUES (N'ST09', N'WIG', CAST(0x253C0B00 AS Date), 4)
INSERT [dbo].[Exam] ([studentID], [subjectCode], [examDate], [examScore]) VALUES (N'ST09', N'WIG', CAST(0x263C0B00 AS Date), 9)
INSERT [dbo].[Exam] ([studentID], [subjectCode], [examDate], [examScore]) VALUES (N'ST10', N'MAD', CAST(0x253C0B00 AS Date), 3)
INSERT [dbo].[Exam] ([studentID], [subjectCode], [examDate], [examScore]) VALUES (N'ST10', N'MAE', CAST(0x253C0B00 AS Date), 4) 

I have to calculate all average score of students such that: (Sorry for the horrible format)

s_i is the average score of the student i^th with i=1…M where M is the total number of students N is the total number of subjects. The average score of the student s_i can be calculated by:

enter image description here

Where x_(i,k) is the last exam’s score of the student s_i on the subjectk^th. The last exam’s score means: student may retake the exam, and only the final result will be involved into the average score calculation. The x_(i,k) may receive Null value if student s_i did not take exam on subject k^th.

enter image description here

I can merely do something like

select studentID, sum(examScore)/count(subjectCode) from Exam group by studentID

How can I process forward the problem with these special cases in the second half?


Solution

  • As I understand your question, you want the average per student, taking into account only its latest score per subject. If so, you can use window functions for filtering, then agregation:

    select studentid, avg(examscore) avgscore
    from (
        select e.*, 
            row_number() over(partition by studentid, subjectcode order by examdate desc) rn
        from exam e
    ) e
    where rn = 1
    group by studentid
    

    If you want to count "missing" subjects as 0, then it's a bit more complicated. You would typically cross join the list of distinct students and subjects, then bring the table with a left join:

    select st.studentid, avg(coalesce(e.examscore, 0)) avgscore
    from (select distinct studentid from exam) st
    cross join (select distinct subjectcode from exam) su
    left join (
        select e.*, 
            row_number() over(partition by studentid, subjectcode order by examdate desc) rn
        from exam e
    ) e 
        on  e.studentid   = st.studentid 
        and e.subjectcode = su.subjectcode 
        and e.rn = 1
    group by st.studentid
    

    In a real life situation, you would probably have separate referential tables to store the students and subjects, which you would use instead of the select distinct_ subqueries.