Search code examples
c#entity-frameworkdenormalization

EF denormalize result of each group join


I have a 1-to-many relationship between a user and his/her schools. I often want to get the primary school for the user (the one with the highest "Type"). This results in having to join the primary school for every query I want to run. A user's schools barely ever change. Are there best practices on how to do this to avoid the constant join? Should I denormalize the models and if so, how? Are there other approaches that are better?

Thanks.

public class User
{
   public int Id { get; set; }
   public virtual IList<UserSchool> UserSchools { get; set; }
   ...
}

public class UserSchool
{
   public int UserId { get; set; }
   public string Name { get; set; }
   public int Type { get; set; }
   ...
}

...

var schools = (from r in _dbcontext.UserSchools
               group r by r.UserId into grp
               select grp.OrderByDescending(x => x.Type).FirstOrDefault());

var results = (from u in _dbcontext.Users
               join us in schools on u.Id equals us.UserId
               select new UserContract
               {
                   Id = u.Id,
                   School = us.Name
               });

Solution

  • In past projects, when I opted to denormalize data, I have denormalized it into separate tables which are updated in the background by the database itself, and tried to keep as much of the process contained in the database software, which handles these things much better. Note that any sort of "run every x seconds" solution will cause a lag in how up-to-date your data is. For something like this, it doesn't sound like the data changes that often, so being a few seconds (or minutes, or days, by the sound of it) out of date is not a big concern. If you're considering denormalization, then retrieval speed must be much more important.

    I have never had a "hard and fast" criteria for when to denormalize, but in general the data must be:

    1. Accessed often. Like multiple times per page load often. Absolutely critical to the application often. Retrieval time must be paramount.
    2. Time insensitive. If the data you need is changing all the time, and it is critical that the data you retrieve is up-to-the-minute, denormalization will have too much overhead to buy you much benefit.
    3. Either an extremely large data set or the result of a relatively complex query. Simple joins can usually be handled by proper indexing, and maybe an indexed view.
    4. Already optimized as much as possible. We've already tried things like indexed views, reorganizing indexes, rewriting underlying queries, and things are still too slow.

    Denormalizing can be very helpful, but it introduces its own headaches, so you want to be very sure that you are ready to deal with those before you commit to it as a solution to your problem.