Using the tables below I would like to sum the dose for each site that is the same course, I want to first group by course and then site and then sum the dose.
Radiation
RadiationId | CourseId | Dose |
---|---|---|
1 | 1 | 10 |
2 | 1 | 20 |
3 | 2 | 5 |
RadiationSites
RadiationId | SiteId |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
2 | 3 |
3 | 1 |
Output
CourseId | SiteId | Total Dose | CourseName | Site |
---|---|---|---|---|
1 | 1 | 10 | Course 1 | Spleen |
1 | 2 | 10 | Course 1 | Spine |
1 | 3 | 30 | Course 1 | Eye |
2 | 1 | 5 | Course 2 | Spleen |
I tried something like the following but can't get my head around it.
var totals =
dbContext
.Radiations
.Include(r => r.RadiationSites)
.ThenInclude(s => s.Site)
.Include(r => r.RadiationCourse)
.Where(r => r.Nhi == NHI)
.SelectMany(s => s.RadiationSites, (Radiation, RadiationSites) => new { Radiation, RadiationSites } )
.GroupBy(r => r.Radiation.RadiationCourseId, r => r.RadiationSites.SiteId);
Is it even possible? Please help!
The thing I don't like about the other answers is that either:
Anyway, there are some important teaching points here. One of them is your task can be vastly easier depending on where you start. Let's do this with EF and see how it looks.
I replicated your DB as much as I could understand it from your post:
It doesn't matter if this doesn't match exactly, but it's helpful to have a diagram when writing code, to aid with naviagtion. Don't worry about minor differences, like maybe you don't have a RadiationSiteId..
Main point: your life would be considerably easier if you start from RadiationSites; it's the Many end of two relationships and only has parents. Being "in" a child object and navigating up to a parent is typically a lot easier than being in a parent and picking one of its many children
You don't need to Include; Include is not JOIN. There are more voluminous tomes you can read on this, but Include is for when you're downloading whole objects and you want to tell EF what other whole, related objects to bring in too. Grouping queries probably don't result in an output object that is wholly an object from the schema, but instead use some other projection to e.g. an anonymous type, so skip the Include. You don't need Include for EF to be able to "see" related data youre accessing - when you use the navigation properties in the Where, GroupBy and Select etc EF will see you travelling from one object to another and form the necessary joins for you in the background
var totals = await db.RadiationSites
.Where(rs => rs.Radiation.Nhi == nhi)
.GroupBy(rs => new { rs.Radiation.CourseId, rs.SiteId, rs.Radiation.Course.CourseName, rs.Site.SiteName })
.Select(g => new { g.Key.CourseId, g.Key.SiteId, TotalDose = g.Sum(rs => rs.Radiation.Dose), g.Key.CourseName, g.Key.SiteName })
.ToArrayAsync();
We start from RadiationSites, and Where up into Radiation to insist that NHI is some value. EF will see this and make the join, however it feels like doing so.
Because we're low down in the relationship hierarchy (thinking of it like a family tree where children are below parents), grouping is similarly easy; we only go as far up the tree as we need to. In some databases you could get away with just grouping by the primary key and the DB knows that implicitly all other columns from that table must be unique, but SQLS likes to have all columns grouped even if e.g. for a course ID of 1 there is never going to be any other value for the course name. Because we want the course name, we fetch it into the grouping key because it'll make life simpler later. We can reach CourseId
by only going up to Radiation.CourseId
, but we'll have to navigate further to get the name - Radiation.Course.CourseName
Ultimately, the way to have this not break your head is to look at your desired output; you want CourseId, SiteId, CourseName, and SiteName as the non grouped (the key) columns, and you want the Sum of the dose as an aggregated column. Put all those key columns into the object you group on: new { rs.Radiation.CourseId, rs.SiteId, rs.Radiation.Course.CourseName, rs.Site.SiteName }
. If you want to simplify things some you could always group on just the IDs, i.e. { rs.Radiation.CourseId, rs.SiteId }
, and fetch the related data later, maybe even by separate query
Then there is the Select, where you pull the key values out again (bit repetitive) but help yourself out by naming the input variable to the Select g
or similar, to reflect the fact that it's a group (a collection of collection-of-RadiationSites), with a .Key
that specifies what all the sites therein were grouped under, and itself is a collection of RadationSites. This is why I use g
inside the Select
, but rs
inside the Sum
, because the Sum is skipping over a "group" of RadiationSites and adding up the Dose
And there you have it: