Search code examples
c#entity-frameworklinqoptimizationquery-optimization

Speeding up entity framework calls


I have a repository for data calls in an API currently that, on test, works perfectly as expected, though with larger data sets, seems to almost get exponentially slower. Ive been looking into more and more ways to try and better performances, though I cant seem to find much information relating to includes and speeding up data queries with Entity Framework.

Below is an example of one of my slower calls. Id like to ask what might be a better way of going about this kind of query. Ill also include a relational diagram of the DB for reference as to the entities it is calling for clarity. Any thoughts?

        public async Task<List<Logic.Objects.Client>> GetAllClients()
        {
            List<Logic.Objects.Client> clientList = (await santaContext.Client
                /* Surveys and responses */
                .Include(c => c.SurveyResponse)
                    .ThenInclude(sr => sr.SurveyQuestion.SurveyQuestionOptionXref)
                        .ThenInclude(sqox => sqox.SurveyOption)
                .Include(c => c.SurveyResponse)
                    .ThenInclude(sr => sr.SurveyQuestion.SurveyQuestionXref)

                .Include(c => c.SurveyResponse)
                    .ThenInclude(sr => sr.Survey.EventType)

                /* Sender/Assignment info and Tags */
                .Include(c => c.ClientRelationXrefRecipientClient)
                    .ThenInclude(crxsc => crxsc.SenderClient.ClientTagXref)
                        .ThenInclude(txr => txr.Tag)
                .Include(c => c.ClientRelationXrefSenderClient)
                    .ThenInclude(crxrc => crxrc.RecipientClient.ClientTagXref)
                        .ThenInclude(txr => txr.Tag)

                /* Relationship event */
                .Include(c => c.ClientRelationXrefSenderClient)
                    .ThenInclude(crxsc => crxsc.EventType)
                .Include(c => c.ClientRelationXrefRecipientClient)
                    .ThenInclude(crxrc => crxrc.EventType)

                /* Chat messages */
                .Include(c => c.ClientRelationXrefRecipientClient)
                    .ThenInclude(cm => cm.ChatMessage)
                .Include(c => c.ClientRelationXrefSenderClient)
                    .ThenInclude(cm => cm.ChatMessage)

                /* Assignment statuses */
                .Include(c => c.ClientRelationXrefRecipientClient)
                    .ThenInclude(stat => stat.AssignmentStatus)
                .Include(c => c.ClientRelationXrefSenderClient)
                    .ThenInclude(stat => stat.AssignmentStatus)

                /* Tags */
                .Include(c => c.ClientTagXref)
                    .ThenInclude(t => t.Tag)

                /* Notes */
                .Include(c => c.Note)

                /* Client approval status */
                .Include(c => c.ClientStatus)
                .AsNoTracking()
                .ToListAsync())
                .Select(Mapper.MapClient).ToList();

            return clientList;
        }

ERD


Solution

  • One of the reasons that your queries are slow are because you use Include to fetch sub-collections.

    If Client [10] has 1000 ChatMessages, then every ChatMessage will have a foreign key ClientId with a value 10. If you fetch "Client [10] with his ChatMessages" you will be sending this value 10 more than 1000 times.

    Furthermore, if you use Include to fetch sub-items, and fetch complete objects, the fetched items are also stored in the DbContext.ChangeTracker. Before you fetch complete objects from the DbContext, it is checked whether the item is already in the ChangeTracker, and if not, the data is queried from the database management system and put in the ChangeTracker.

    When using entity framework always use Select, and select only the properties that you actually plan to use. Only select complete objects and only use Include if you plan to update the fetched data.

    Certainly don't use Include to save you some typing.

    Furthermore, it is wise not to use the original table row classes in your return, but to use either anonymous types, or if you need to return from a method, use a special repository class that is similar to the original tables.

    The advantage of intermediate repository classes gives you the freedom to hide your database layout. You even hide that you are using a database. It can be an XML file, or JSON, or for unit tests a Dictionary. Because you disconnect the table layout from the data that users of your methods get, You can deviate the returned structures from the database tables. This enables different queries for different user types. Future changes in database layout can be done without having to change the repository queries.

    For instance, if you decide to add a property IsObsolete, then you could choose to give ordinary users queries that returns Clients without this property. If an ordinary user queries "all Clients that ..." , he gets "all non-obsolete clients that ...". SuperUser can declare Clients Obsolete, or reverse it, if he made an error. If a SuperUser queries Clients, het gets a different kind of Repository Client, namely one that has the extra property IsObsolete. An Administrator user can regularly delete all rows from the tables that are Obsolete for several months now, something that SuperUsers can't.

    An example of making it possible to change tables without having to change the database layout:

    A Student has an Address. Currently your DbContext Student class is similar to the Repository Student class:

    class Student
    {
        public int Id {get; set;}
        public string Name {get; set;}
        public string Street {get; set;}
        public string City {get; set;}
        ...
    }
    

    Later you find that quite often the Street / City contains typing errors. Hence you buy a CDROM with tables with all addresses in the country, and decide to make a one-to-many relation between Address and Student: every Address has zero or more Students, every Student lives at exactly one Address, namely the Address that the foreign key AddressId refers to.

    class Student
    {
        public int Id {get; set;}
        public string Name {get; set;}
    
        public int AddressId {get; set;}
    }
    

    Because you know that your address tables are correct you can be certain that every Student lives at an existing Address, without any typing error. If the government decides to rename a street, you don't have to update the Students that live in this Street, only the changed Addresses.

    If a Student moves, all you have to do is point to the new Address, and all Address lines are correctly changed.

    Because you separated the DbContext Student from the Repository Student, query-only users from your Repository won't notice any change when they query Students. Only Student Updaters will see an interface change.

    Back to your question

    So your query will considerably speed up if you do the following:

    List<Logic.Objects.Client> clientList = (await santaContext.Clients
    .Select(client => new Repository.Client()
    {
        // only Select the Client properties that you plan to use
        Id = client.Id,
        Name = client.Name,
        ...
    
        SurveyResponses = client.SurveyReponses.Select(surveryResponse => new Repository.SurveyResponse
        {
             // again: only the properties that you plan to use:
             Id = surveyResponse.Id,
             ...
    
             // not needed, you already got the value:
             // ClientId = surveyResponse.ClientId,
    
             Questions = surveyResponse.SurveyQuestions.Select(...).ToList(),
             Responses = surveyReponses...
        })
        .ToList(),
    
        // similar for Sender / Assignment / Tag / etc
    })
    

    So several performance improvements:

    • Because you used Select, the dbContext didn't check if the data was already in the ChangeTracker, and after if has been fetched it isn't added to the ChangeTracker
    • Because you only selected properties that you plan to use, no irrelevant data is fetched, nor data of which you already know the value, like values of foreign keys.
    • Not an improvement that will speed up your queries, but will speed up future changes: because you separated database layout from the repository classes, you enabled future changes in the database without having users to notice these changes.