Search code examples
c#linqgroup-bylinq-to-entities

Using GroupBy to compute average or count based on the whole data until the corresponding date


I have the AssessmentItems DB object which contains the items about: Which user evaluated (EvaluatorId), which submission (SubmissionId), based on which rubric item (or criteria)(RubricItemId) and when (DateCreated).

I group by this object by RubricItemId and DateCreated to get compute some daily statistics based on each assessment criteria (or rubric item).

For example, I compute the AverageScore, which works fine and returns an output like: RubricItem: 1, Day: 15/01/2019, AverageScore: 3.2.

_context.AssessmentItems
        .Include(ai => ai.RubricItem)
        .Include(ai => ai.Assessment)
        .Where(ai => ai.RubricItem.RubricId == rubricId && ai.Assessment.Submission.ReviewRoundId == reviewRoundId)
        .Select(ai => new
        {
            ai.Id,
            DateCreated = ai.DateCreated.ToShortDateString(),//.ToString(@"yyyy-MM-dd"),
            ai.CurrentScore,
            ai.RubricItemId,
            ai.Assessment.SubmissionId,
            ai.Assessment.EvaluatorId

        })
        .GroupBy(ai => new { ai.RubricItemId, ai.DateCreated })
        .Select(g => new
        {
            g.Key.RubricItemId,
            g.Key.DateCreated,
            AverageScore = g.Average(ai => ai.CurrentScore),
            NumberOfStudentsEvaluating = g.Select(ai => ai.EvaluatorId).Distinct().Count(),

        }).ToList();

What I want to do is to compute the average until that day. I mean instead of calculating the average for the day, I want to get the average until that day (that is, I want to consider the assessment scores of the preceding days). The same why, when I compute NumberOfStudentsEvaluating, I want to indicate the total number of students participated in the evaluation until that day.

One approach to achieve this could be to iterate through the result object and compute these properties again:

 foreach (var i in result)
 {
     i.AverageScore = result.Where(r => r.DateCreated <= i.DateCreated).Select(r => r.AverageScore).Average(),

 }

But, this is quite costly. I wonder if it is possible to tweak the code a bit to achieve this, or should I start from scratch with another approach.


Solution

  • If you split the query into two halves, you can compute the average as you would like (I also computed the NumberOfStudentsEvaluating on the same criteria) but I am not sure if EF/EF Core will be able to translate to SQL:

    var base1 = _context.AssessmentItems
        .Include(ai => ai.RubricItem)
        .Include(ai => ai.Assessment)
        .Where(ai => ai.RubricItem.RubricId == rubricId && ai.Assessment.Submission.ReviewRoundId == reviewRoundId)
        .Select(ai => new {
            ai.Id,
            ai.DateCreated,
            ai.CurrentScore,
            ai.RubricItemId,
            ai.Assessment.SubmissionId,
            ai.Assessment.EvaluatorId
    
        })
        .GroupBy(ai => ai.RubricItemId);
    
    var ans1 = base1
                .SelectMany(rig => rig.Select(ai => ai.DateCreated).Distinct().Select(DateCreated => new { RubricItemId = rig.Key, DateCreated, Items = rig.Where(b => b.DateCreated <= DateCreated) }))
                .Select(g => new {
                    g.RubricItemId,
                    DateCreated = g.DateCreated.ToShortDateString(), //.ToString(@"yyyy-MM-dd"),
                    AverageScore = g.Items.Average(ai => ai.CurrentScore),
                    NumberOfStudentsEvaluating = g.Items.Select(ai => ai.EvaluatorId).Distinct().Count(),
                }).ToList();