Search code examples
c#entity-framework-coreinner-join

How to fetch nested entities despite duplicates in EF?


I have DB with translations dictionary. It has entity "Word" and "Translation". Each translation is linked to one "source" word and one "target" word -- for example I could have translation with "cat" (English) and "kot" (Polish) which means "cat" translates to "kot", and "kot" translates to "cat".

When searching for given phrase (e.g. "cat") in dictionary I would like to fetch Word entity for this phrase and all words it is translated to. So the query I started from looks like this:

Word[] words = ctx.Words
  .Include(it => it.TranslationSrcWords) 
  .ThenInclude(it => it.SrcWord)
  .Include(it => it.TranslationDstWords)
  .ThenInclude(it => it.DstWord)
  ...

With such code I get error:

Entity Framework Core will automatically fix-up navigation properties to any other entities that were previously loaded into the context instance.

Of course the above query would result with word duplicates -- one fetched right at the start (ctx.Words) and then either from SrcWord or DstWord (one side of the translation has to point back to the word being found).

EF "optimization" forces me to either suppress the error or comment out nested include -- in both cases it leads to partial fetch. I.e. when I look for "cat" instead of getting translations like:

("cat", "kot")

I get translations like:

("cat", null)

The "cat" is fixed up indeed by EF, and null because I had to skip nested include so there is simply no data.

How to solve it, i.e. fetch data in one take?

Full repo is here: https://codeberg.org/macias/lala/src/branch/master Simplified models:

public sealed class Word
{
    public long Id { get; set; }
    public string Phrase { get; set; } = null!;

    public  ICollection<Translation> TranslationDstWords { get; set; } = default!;
    public  ICollection<Translation> TranslationSrcWords { get; set; } = default!;
}

public sealed class Translation
{
    public long Id { get; set; }
    public long SrcWordId { get; set; }
    public long DstWordId { get; set; }

    public  Word DstWord { get; set; } = null!;
    public  Word SrcWord { get; set; } = null!;
}


Solution

  • The error you are getting is because when you include one navigation side of a relationship (either collection or reference), the other side (aka inverse) navigation is automatically included by EF Core.

    In your sample, both

    .Include(it => it.TranslationSrcWords) 
        .ThenInclude(it => it.SrcWord)
    

    and

    .Include(it => it.TranslationDstWords) 
        .ThenInclude(it => it.DstWord)
    

    independently violate that rule and will cause the aforementioned exception.

    What you really need is to ThenInclude navigations other than the inverse. For instance:

    .Include(it => it.TranslationSrcWords) 
        .ThenInclude(it => it.DstWord)
    

    and

    .Include(it => it.TranslationDstWords) 
        .ThenInclude(it => it.SrcWord)
    

    This in general does what you need without EF Core errors.

    However, in this particular case the two many-to-one relationships effectively form single many-to-many self relationships. And taking into account that including navigation (with both Include or ThenInclude) automatically includes the inverse, you can easily see that it will be enough to include in the query just one of the above Include / ThenInclude requests. Because

    Include(word => word.TranslationSrcWords)
    

    automatically includes translation.SrcWord. Then

    ThenInclude(translation => translation.DstWord)
    

    automatically includes translation.DstWord.TranslationDstWords, and the cycle is closed.

    Shortly, for this example I would suggest including only one of the above Include / ThenInclude pairs. You can include both if you wish, and there won't be an error, but it would lead to much more inefficient SQL, and the result will be the same.