Search code examples
c#entity-frameworkasync-awaitasp.net-mvc-5parallel.invoke

Async vs Parallel.Invoke vs Task.WhenAll for EntityF6 query that gets data, in ASP MVC 5 web app


I'm trying to figure what is the best approach, apart of synchronous programming, for doing some EF6 queries that retrieve data. I'll post here all 5 methods(these take place in a Controller Action ):

//would it be better to not "async" the ActionResult?
public async Task<ActionResult> Index{
   // I depend on this so I don't even know if it's ok to make it async or not -> what do you think?
   var userinfo = _dataservice.getUserInfo("John");

   // C1: synchronous way
   var watch1 =  System.Diagnostics.Stopwatch.StartNew();
   var info1 = _getInfoService.GetSomeInfo1(userinfo);
   var info2 = _getInfoService.GetSomeInfo2(userinfo);
   watch1.Stop();
   var t1 = watch.EllapsedMilliSeconds; // this takes about 3200
   
   // C2: asynchronous way
   var watch2 =  System.Diagnostics.Stopwatch.StartNew();
   var infoA1 = await _getInfoService.GetSomeInfoAsync1(userinfo).ConfigureAwait(false);
   var infoA2 = await _getInfoService.GetSomeInfoAsync2(userinfo).ConfigureAwait(false);
   watch2.Stop();
   var t2 = watch2.EllapsedMilliSeconds; // this takes about 3020

   // C2.1: asynchronous way launch then await
   var watch21 =  System.Diagnostics.Stopwatch.StartNew();
   var infoA21 = _getInfoService.GetSomeInfoAsync1(userinfo).ConfigureAwait(false);
   var infoA22 = _getInfoService.GetSomeInfoAsync2(userinfo).ConfigureAwait(false);
   // I tought if I launch them first then await, it would run faster...but not
   var a = await infoA21;
   var b = await infoA22;
   watch21.Stop();
   var t21 = watch21.EllapsedMilliSeconds; // this takes about the same 30201

   // C3: asynchronous with Task.Run() and await.WhenAll()
   var watch1 =  System.Diagnostics.Stopwatch.StartNew();
   var infoT1 = TaskRun(() => _getInfoService.GetSomeInfo1(userinfo));
   var infoT2 = TaskRun(() => _getInfoService.GetSomeInfo2(userinfo));
await Task.WhenAll(infoT1,infoT2)
   watch3.Stop();
   var t3 = watch3.EllapsedMilliSeconds; // this takes about 2010

   // C4: Parallel way
   MyType var1; MyType2 var2;
   var watch4 =  System.Diagnostics.Stopwatch.StartNew();
   Parallel.Invoke(
      () => var1 = _getInfoService.GetSomeInfoAsync1(userinfo).GetAwaiter().GetResult(),// also using just _getInfoService.GetSomeInfo1(userinfo) - but sometimes throws an Entity error on F10 debugging
      () => var2 = _getInfoService.GetSomeInfoAsync2(userinfo).GetAwaiter().GetResult()// also using just _getInfoService.GetSomeInfo2(userinfo)- but sometimes throws an Entity error on F10 debugging
   );
   watch4.Stop();
   var t4 = watch4.EllapsedMilliSeconds; // this takes about 2012
}

Methods implementation:

public MyType1 GetSomeInfo1(SomeOtherType param){
 // result = some LINQ queries here
 Thread.Sleep(1000);
 return result;
}
public MyType2 GetSomeInfo2(SomeOtherType param){
 // result = some LINQ queries here
 Thread.Sleep(2000);
 return result;
}

public Task<MyType1> GetSomeInfoAsync1(SomeOtherType param){
 // result = some LINQ queries here
 Thread.Sleep(1000);
 return Task.FromResult(result);
}

public Task<MyType2> GetSomeInfoAsync2(SomeOtherType param){
 // result = some LINQ queries here
 Thread.Sleep(2000);
 return Task.FromResult(result);
}
  1. If I understood correctly, await for 2 tasks(like in C2 and C2.1) does not make them run in parallel(not even in C.1 example where I launch them first then await), it just frees the current thread and gives them to another 2 different threads that will deal with those tasks
  2. Task.Run() will in fact do just as Invoke.Parallel does, spreading the work on 2 different CPU's for making them run in parallel
  3. Launching them first and then awaiting (C.1 example) shouldn't make them run a some sort of parallel way?
  4. Would it be better not using async or parallel at all?

Please make me understand on these examples how can I have async and also better performance, also if there are any implications with EntityF that I must consider. I'm reading for a few days already and I only get confused, so please don't give me another links to read :)


Solution

  • async code can be mixed with parallelism by calling without await, then awaiting a Task.WaitAll(). However, the main consideration when looking at parallelism is ensuring the code called is thread -safe. DbContexts are not thread-safe, so to run parallel operations you need separate DbContext instances for each method. This means that code that normally relies on dependency injection to receive a DbContext/Unit of Work and would get a reference that is lifetime scoped to something like the web request cannot be used in parallelized calls. Calls that are parallelized will need to have a DbContext that is scoped for just that call.

    When dealing with parallelized methods working with EF Entities that also means that you need to ensure that any entity references are treated as detached entities. They cannot safely be associated with one another as if they had been returned by different DbContexts in different parallel tasks.

    For example, using normal async & await:

    var order = await Repository.GetOrderById(orderId);
    var orderLine = await Repository.CreateOrderLineForProduct(productId, quantity);
    order.OrderLines.Add(orderLine);
    await Repository.SaveChanges();
    

    As a very basic example where the repository class gets a DbContext injected. The CreateOrderLine method would be using the DbContext to load the Product and possibly other details to make an OrderLine. When awaited, the async variants ensure only one thread is accessing the DbContext at a time so the same single DbContext instance can be used by the Repository. The Order, new OrderLine, Product, etc. are all tracked by the same DbContext instance so a SaveChanges call issued by the repository against that single instance will work as expected.

    If we tried to parallelize it like:

    var orderTask = Repository.GetOrderById(orderId);
    var orderLineTask = Repository.CreateOrderLineForProduct(productId, quantity);
    await Task.WhenAll(orderTask, orderLineTask);
    var order = orderTask.Result;
    var orderLine = orderLineTask.Result;
    
    order.OrderLines.Add(orderLine);
    await Repository.SaveChanges();
    

    This would likely result in exceptions from EF that the DbContext is being accessed across threads as both the GetOrderById, and calls within CreateOrderLine. What's worse is that EF won't detect that it is being called by multiple threads until those threads both try to access the DbSets etc. at the same time. So this can sometimes result in an intermittent error that might not appear during testing or appear reliably when not under load (queries all finish quite quickly and don't trip on each other) but grind to a halt with exceptions when running under load. To address this, the DbContext reference in the Repository needs to be scoped for each method. This means rather than using an injected DbContext, it needs to look more like:

    public Order GetOrderById(int orderId)
    {
        using(var context = new AppDbContext())
        {
            return context.Orders
                .Include(x=>x.OrderLines)
                .AsNoTracking()
                .Single(x => x.OrderId == orderId);
        }
    }
    

    We could still use dependency injection to inject something like a DbContext Factory class to create the DbContext which can be mocked out. The key thing is that the scope of the DbContext must be moved to within the parallelized method. AsNoTracking() is important because we cannot leave this order "tracked" by this DbContext; When we want to save the order and any other associated entities, we will have to associate this order with a new DbContext instance. (this one is being disposed) If the Entity still thinks it's tracked, that will result in an error. This also means that the repository Save has to change to something more like:

    Repository.Save(order);
    

    to pass in an entity, associate it and all referenced entities with a DbContext, and then calling SaveChanges.

    Needless to say this starts getting messy, and it hasn't even touched on things like exception handling. You also lose aspects like change tracking because of the need to work with detached entities. To avoid potential issues between tracked and untracked entities and such I would recommend that parallelized code should always deal with POCO view models or more complete "operations" with entities rather than doing things like returning detached entities. We want to avoid confusion between code that might be called via an Order that is tracked (using synchronous or async calls) vs. an Order that is not tracked because it is the result of a parallelized call. That said, it can have its uses, but I would highly recommend keeping it's use to a minimum.

    async/await can be an excellent pattern to adopt for longer, individual operations where that web request can expect to wait a few seconds such as a search or report. This frees up the web request handling thread to start responding to other requests while the user waits. Hence it's use to boost server responsiveness, not to be confused with making calls faster. For short and snappy operations it ends up adding a bit of extra overhead, so these should just be left as synchronous calls. async is not something I would ever argue needs to be an "all or nothing" decision in an application.

    So that above example, loading an Order by ID and creating an Orderline would be something that I would normally leave synchronous, not asynchronous. Loading an entity graph by ID is typically quite fast. A better example where I would leverage async would be something like:

    var query =  Repository.GetOrders()
        .Where(x =>  x.OrderStatus.OrerStatusId == OrderStatus.New 
            && x.DispatchDate <= DateTime.Today());
    if (searchCriteria.Any())
        query = query.Where(buildCriteria(searchCriteria));
    
    var pendingOrders = await query.Skip(pageNumber * pageSize)
        .Take(PageSize)
        .ProjectTo<OrderSearchResultViewModel>()
        .ToListAsync();
    

    Where in this example I have a search operation which is expected to run across a potentially large number of orders and possibly include less efficient user defined search criteria before fetching a page of results. It might take less than a second, or several seconds to run, and there could be a number of calls, including other searches, to be processing from other users at the time.

    Parallelization is more geared towards situations where there are a mix of long and short-running operations that need to be completed as a unit so one doesn't need to wait for the other to complete before it starts. Much more care needs to be taken in this model when it comes to operations with EF Entities, so it's definitely not a pattern I would design as the "default" for in a system.

    So to summarize:

    Synchronous - Quick hits to the database or in-memory cache such as pulling rows by ID or in general queries expected to execute in 250ms or less. (Basically, the default)

    Asynchronous - Bigger queries across larger sets with potentially slower execution time such as dynamic searches, or shorter operations that are expected to be called extremely frequently.

    Parallel - Expensive operations that will launching several queries to complete where the queries can be "stripped" for the necessary data and run completely independently and in the background. I.e. reports or building exports, etc.