Search code examples
c#entity-frameworklinqentity-framework-6

Problem in calculating statistics using group by based on statuses


I have a data like below :

Table : LeaveRequest

Id    DepartmentId
1     100

Table: LeaveUpdateLogs

Id      RequestedDate             LeaveRequestId     Status
1       2020-01-26 11:55:56       1                  Pending
2       2020-02-24 10:55:56       1                  Accepted
3       2020-02-24 11:55:56       1                  Accepted
4       2020-03-01 09:55:56       1                  Declined
5       2020-03-27 10:55:56       1                  Closed

6       2020-01-09 05:55:56       2                  Pending
6       2020-02-09 05:55:56       2                  Accepted
7       2020-05-12 02:55:56       2                  Accepted
8       2020-06-14 05:55:56       2                  Declined
9       2020-06-15 05:55:56       2                  Closed 

I want to calculate statistics for between Start date and EndDate and want to count statistics for each of the status.

Expected output: Start Date = 01-01-2020 EndDate = 06-30-2020

Pending =  2 (2020-01-26 11:55:56,2020-01-09 05:55:56)
Accepted = 3 (2020-02-24 11:55:56,2020-02-09 05:55:56,2020-05-12 02:55:56)
Declined = 0
Closed = 2 (2020-03-27 10:55:56, 2020-06-15 05:55:56)

Classes:

public class LeaveRequest 
    {

        public int Id { get; set; }
        
        public int DepartmentId { get; set; }

        public virtual ICollection<LeaveUpdateLogs> LeaveUpdateLogs { get; set; }

    }
    
    public class LeaveUpdateLogs 
    {
    
        public int Id { get; set; }

        public DateTimeOffset RequestedDate { get; set; }
        
        public int LeaveRequestId { get; set; }
        
        public string Status { get; set; }

        public virtual LeaveRequest LeaveRequest { get; set; }

    }

Query:

var query = from l in context.LeaveUpdateLogs
            where l.LeaveRequest.DepartmentId == 100 &&
            (l.RequestedDate >= fromDate && l.RequestedDate < toDate)

Only challenge here for me is to calculate the data for the same month and same LeaveRequestId. For example :

Id      RequestedDate             LeaveRequestId     Status
4       2020-03-01 09:55:56       1                  Declined
5       2020-03-27 10:55:56       1                  Closed

For above data, we have 2 status in the same month and for the same LeaveRequestId i.e 1 but I want to consider last date for that month("2020-03-27") and increment 1 for that status("Closed").

I will really appreciate any help :)


Solution

  • It looks like you want the "most recent status per leaveupdatelog leaverequestid-month" then generate statistics per status, which would imply to me to be two groupings. The first grouping knocks out the uninteresting statuses and the second counts them

    var interesting = query.GroupBy(lul => new{lul.LeaveRequestId, D=new DateTime(lul.RequestedDate.Year, lul.RequestedDate.Month, 1)})
        .Select(g => g.OrderByDescending(gg => gg.RequestedDate).First());
    

    This groups the data by ID and month of requested date. It means you get groups where there are two members for 2,3 and 4,5 and 8,9 then selects only the latest dated one, removing 2,4,8

    Then we can perform another grouping on the remainder

    var stats = interesting.GroupBy(lul => lul.Status);
    

    This realizes a collection where each item in stats has a key of status and a collection of leaveupdatelogs (should really rename that class to be singular, classes shouldn't have plural names) with the data, to whit:

    foreach(var g in stats)
        Console.WriteLine($"{g.Key} = {g.Count} ({string.Join(',', g.Select(gg => gg.RequestedDate))}");
    

    should produce the output you expect, with the exception that there are no Declined so nothing will print for them. If it's important to not have them "notified by their absence" then perhaps consider another operation to reveal them such as

    someArrayOfAllStatus.Except(stats.Select(g=>g.Key))
    

    You could generate the array of all status from the enum itself or the original query (using select/distinct) depending on whether there are other status in the enum that you never want to show because they aren't in the query

    Important to remember that LINQ group isn't like sql group. In sql you have to specify aggregations and throw data away because you can't have a GROUP BY without a SELECT. In LINQ you can, so the group operation literally forms the keyed buckets and puts all the data inside them as a collection, so at any step all the original data is there to be manipulated. In other words, LINQ group just breaks a dataset of X records into Y lots of Z records (where Y * Avg(Z) = X) so you can iterate each Y and do things like "take only the first Z" (which is what we did firstly) or "count and stringjoin all the Z" (which is what we did secondly)