Search code examples
c#performanceentity-frameworklinqquery-optimization

Using IQueryable.Contains inside of the query: would this IQueryable get executed multiple times?


I have a IQuearyable:

 var ids = entities.Select(s => s.Id).OrderBy(s => s);

If it is used to filter down when retrieving other entities:

    EntityService
      .For<OtherEntity>()
      .GetAll()
      .Where(s => ids.Contains(s.EntityId))

Does the original ids query gets executed multiple times? And, thus, it would be better to bring it into the memory before the second call?


Solution

  • Short answer: "No".

    Long answer: Your example looks to have bigger problems.

    ORMs like EF excel at mapping relationships between tables as an object structure. Say for instance I have an Order table and a Delivery table. Each Order has 0 or 1 Deliveries depending on whether it's delivered or not. So either the Delivery table has an OrderId, or the Order table has a DeliveryId. In the purest example the Delivery table would use OrderId as it's PK and FK to cement that one-to-one/zero relationship.

    Now in SQL land there are a couple of ways you can get at the desired deliveries. You could single-step it by querying the Orders table to get the OrderIds or DeliveryIds for the applicable deliveries, then query the Deliveries table using an IN clause against those Ids. Alternatively you could JOIN the two tables on the respective FK.

    In your example you are doing the former, which will work, but has a few drawbacks. Performance isn't great as you will be doing two round trips to the database and have to store those Ids to feed the 2nd query. There are also limitations to how many values you can feed the resulting IN clause in the second query. If your first query has the potential to return 100,000 order IDs, you're in for a rough time. Sure when a system starts out it's only dealing with hundreds or maybe thousands of rows, but code written this way can soon bog down and suddenly fail as the system gets some real-world data behind it.

    EF supports the later my means of the use of navigation properties, or explicit joining. If you have FKs between the related tables then navigation properties are by far the way to go. Explicit joining should be reserved for the rare cases where you need to deal with non-normalized relationships where you cannot use a FK. Examples of this are data anti-patterns like tables that use an OwnerType+OwnerId structure to share themselves between multiple other tables. Classic examples are things like an Address table where rather than using joining tables like CustomerAddress and CompanyAddress to link Address to a Customer table vs. a Company table, they add a OwnerType of "Customer" or "Company", and set the OwnerId to the respective Id in the applicable table. The drawback here is that we cannot establish a FK relationship between Address and either of the related tables because the value relates to both tables (resulting in it being enforced for neither) In these cases we would need to use an explicit Join with applicable filter for OwnerType, or resort to the IN approach.

    With navigation properties our Order entity would have a Delivery navigation property set up to use either the OrderId between the two tables or the Order's DeliveryId depending on how the relationship is set up. The Delivery entity can in turn have a navigation property back to the Order if we like. Relationships can be uni-directional or bi-directional, but ultimately we can get by with either and I recommend using uni-directional navigation properties unless a bi-directional is absolutely needed. (It saves on potential complications when re-associating relationships between entities if those relationships need to be alterable)

    In our entities we'd have something like:

    [Table("Orders")]
    public class Order
    {
       [Key, DatabaseGenerated(DatabaseGeneratedOption.Identity)]
        public int OrderId { get; set; }
        public DateTime OrderDateTime { get; set; }
    
       // Other order fields...
    
       public virtual Delivery Delivery { get; set; }
    }
    

    So with a navigation property between Order and Delivery, if we want the delivery details for all orders created on a specific date:

    startDate = startDate.Date; // strip any time.
    var endDateTime = startDate.AddDays(1).AddTicks(-1);
    
    var orderDeliveryDetails = _context.Orders
         .Where(o => o.OrderDateTime >= startDate && o.OrderDateTime <= endDateTime)
         .Select(o => new { o.OrderId, o.Delivery })
         .ToList();
    

    We could have just gone to the Delivery entities and queried where d => d.Order.OrderDateTime..... if we have a bi-directional reference back to Order. However this would have only returned deliveries for orders that had deliveries. In the first example I return an anonymous type that returns every applicable Order ID, and a Delivery if it has one. That will return #null for the delivery if the Order hasn't had a Delivery recorded against it.

    Situations where we need to consider IN type queries would be cases where we need to link tables that are in two completely different databases and managed by different DbContexts. When doing something like this you should consider paginating the data to ensure that the resulting queries don't get out of control. Setting up linked views between the databases would be a good option so that you could establish entities within a single DbContext with a relationship to use to query the data.

    The next potential problem is statements like this:

    EntityService
      .For<OtherEntity>()
      .GetAll()
    

    "GetAll()" looks like a huge red flag if this results in something like a ToList() or AsEnumerable() against an IQueryable returned by that "For" method. Doing so would result in loading ALL OtherEntity rows into memory before you applied your Where clause. The ramifications of that should be pretty obvious. I'm very wary of all attempts to abstract the EF DbContext, especially Generic implementations. These more often than not only accomplish putting a straight-jacket on EF leading to performance and behaviour issues.