sql sql-server average aggregate-functions greatest-n-per-group

Calculate query with some special case

I have a table

Exam (studentID, subjectCode, examDate, examScore)

and a lot of values like

INSERT [dbo].[Exam] ([studentID], [subjectCode], [examDate], [examScore]) VALUES (N'ST01', N'PRJ', CAST(0x253C0B00 AS Date), 5)
INSERT [dbo].[Exam] ([studentID], [subjectCode], [examDate], [examScore]) VALUES (N'ST01', N'WIG', CAST(0x253C0B00 AS Date), 1)
 . . .
INSERT [dbo].[Exam] ([studentID], [subjectCode], [examDate], [examScore]) VALUES (N'ST09', N'WIG', CAST(0x253C0B00 AS Date), 4)
INSERT [dbo].[Exam] ([studentID], [subjectCode], [examDate], [examScore]) VALUES (N'ST09', N'WIG', CAST(0x263C0B00 AS Date), 9)
INSERT [dbo].[Exam] ([studentID], [subjectCode], [examDate], [examScore]) VALUES (N'ST10', N'MAD', CAST(0x253C0B00 AS Date), 3)
INSERT [dbo].[Exam] ([studentID], [subjectCode], [examDate], [examScore]) VALUES (N'ST10', N'MAE', CAST(0x253C0B00 AS Date), 4)

I have to calculate all average score of students such that: (Sorry for the horrible format)

s_i is the average score of the student i^th with i=1…M where M is the total number of students N is the total number of subjects. The average score of the student s_i can be calculated by:

Where x_(i,k) is the last exam’s score of the student s_i on the subjectk^th. The last exam’s score means: student may retake the exam, and only the final result will be involved into the average score calculation. The x_(i,k) may receive Null value if student s_i did not take exam on subject k^th.

I can merely do something like

select studentID, sum(examScore)/count(subjectCode) from Exam group by studentID

How can I process forward the problem with these special cases in the second half?

Solution

As I understand your question, you want the average per student, taking into account only its latest score per subject. If so, you can use window functions for filtering, then agregation:

select studentid, avg(examscore) avgscore
from (
    select e.*, 
        row_number() over(partition by studentid, subjectcode order by examdate desc) rn
    from exam e
) e
where rn = 1
group by studentid

If you want to count "missing" subjects as 0, then it's a bit more complicated. You would typically cross join the list of distinct students and subjects, then bring the table with a left join:

select st.studentid, avg(coalesce(e.examscore, 0)) avgscore
from (select distinct studentid from exam) st
cross join (select distinct subjectcode from exam) su
left join (
    select e.*, 
        row_number() over(partition by studentid, subjectcode order by examdate desc) rn
    from exam e
) e 
    on  e.studentid   = st.studentid 
    and e.subjectcode = su.subjectcode 
    and e.rn = 1
group by st.studentid

In a real life situation, you would probably have separate referential tables to store the students and subjects, which you would use instead of the select distinct_ subqueries.