Search code examples
c#.net-coreentity-framework-corelinq-to-sql

Entity Framework Core query where nested collection matches another collection returns wrong results


I am having a problem with Entity Framework Core 5.x where the "wrong" query is being generated when trying to find the collection of entities where a navigation property collection contains all elements from another list.

Given the following simplified (but standard many-to-many) model

public class Announcement {
    public string Id { get; set; }
    public string Title { get; set; }
    public virtual IEnumerable<AnnouncementTag> Tags { get; set; }
}
    
public class Tag {
    public string Id { get; set; }
    public bool IsEnabled { get; set; } 
    public virtual IEnumerable<AnnouncementTag> Announcements{ get; set; }
}

// Needed since EF Core does not support many-to-many without explicit JOIN class
public class AnnouncementTag {
    public string AnnouncementId { get; set; }
    public string TagId { get; set; }
    public Announcement Announcement { get; set; }
    public Tag Tag { get; set; }
}

I would like to get the list of Announcements which have all of their tags match what is in a given list.

For example, with the following data

Announcement 1    has Tags A, B
Announcement 2    has Tags B
Announcement 3    has Tags C

A query for "all announcements with tags ["B"]" would return Announcement 2 since Announcement 3 has nothing that matches and Announcement 1 only matches on one tag.

I would have thought that the following LINQ query for return the results that I want

var applicableTags = new [] { "B" };

var results = db.Announcements.Where(a => a.Tags.All(at => applicableTags.Contains(at.Tag.Id));

But that results in all Announcements coming back. Just to simplify the query, I modified it to use the JOIN class directly (TagId = Tag.Id)

var results = db.Announcements.Where(a => a.Tags.All(at => applicableTags.Contains(at.TagId));

But that results in the same where all Announcements are returned. The generated query seems odd

SELECT * FROM [Announcements] AS [a]
WHERE NOT EXISTS (
    SELECT 1
    FROM [AnnouncementTag] AS [a1]
    INNER JOIN [Tags] AS [t1] ON [a1].[TagId] = [t1].[Id]
    -- Really... <>... opposite of what I would have expected
    -- When I query for more than one Tag...say A,B...the <> becomes "NOT IN"
    WHERE ([a].[Id] = [a1].[AnnouncementId]) AND ([t1].[Id] <> N'B')
)

I was able to duplicate this behavior with LINQ-to-Objects so it does not seem to be an Entity Framework thing... just me not understanding how to generate the query.

I did find one article that stated the following work

// Notice how the applicableTags and Tags criteria is flipped from previous query
var results = db.Announcements.Where(a => applicableTags.All(at => a.Tags.Any(t => t.Id == at)));

And it does...but only for LINQ-to-Objects. For Entity Framework, you get the following exception

System.InvalidOperationException: The LINQ expression 'DbSet<Announcement>()
  .Where(a => __applicableTags_0
      .All(at => DbSet<AnnouncementTag>()
          .Where(a0 => EF.Property<string>(a, "Id") != null && object.Equals(
              objA: EF.Property<string>(a, "Id"),
              objB: EF.Property<string>(a0, "AnnouncementId")))
          .Any(a0 => a0.TagId == at)))' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to 'AsEnumerable', 'AsAsyncEnumerable', 'ToList', or 'ToListAsync'

But I would not want to have to bring back all Announcements in order to further filter on the server.

Any suggestions on what I am doing wrong? Can somebody please explain the flaw in my original query that is returning the unexpected results?


Solution

  • Assuming that tags are unique, you can express your query in the following way:

    var tagCount = applicableTags.Length;
    
    var results = db.Announcements
        .Where(a => a.Tags.Where(t => applicableTags.Contains(t.Id)).Count() >= tagCount);