c#linq asp.net-core entity-framework-core deferred-execution

Is my C#.NET LINQ expression optimized for deferred execution (Entity Framework Core)

I am building a REST API using C#.NET for a customer, which will be used to retrieve error logs from a database. The database has three tables: Fault, Message and MessageData. The relation of the tables are as follows:

Fault <---* Message <---1 MessageData

Meaning one Fault can have multiple messages linked to it from the Message table, which in turn can have one message data linked to it from the MessageData table.

I have created Entity Models representing these tables using Entity Framework Core. I have also created DTOs for each Entity Model, containing only the data that is relevant to transfer over the wire. In my repository class, I am using LINQ to write a query to the database, mapping the result over to my DTOs from my Entity Models.

My question is whether this code can be rewritten to be more sufficient, specially in terms of deferred execution (not wanting to make any unnessecary roundtrips to the database):

public async Task<IEnumerable<FaultDTO>> GetFaultsAsync(string? application, DateTime? fromDate, DateTime? toDate, int? count)
        {
            List<FaultDTO> faults;

            fromDate = (fromDate == null) ? DateTime.Today.AddDays(-30) : fromDate;
            toDate = (toDate == null) ? DateTime.Today : toDate;
            count = (count == null) ? 10 : count;

            faults = await _context.Faults.Where(fault => String.IsNullOrWhiteSpace(application) ? fromDate <= fault.InsertedDate && fault.InsertedDate <= toDate : fault.Application.ToLower() == application.ToLower() && fromDate <= fault.InsertedDate && fault.InsertedDate <= toDate).Select(fault => new FaultDTO()
            {
                FaultId = fault.FaultId,
                InsertedDate = fault.InsertedDate,
                MachineName = fault.MachineName,
                ServiceName = fault.ServiceName,
                Scope = fault.Scope,
                FaultDescription = fault.FaultDescription,
                Messages = _context.Messages.Where(msg => fault.FaultId == msg.FaultId).Select(msg => new MessageDTO()
                {
                    MessageId = msg.MessageId,
                    MessageName = msg.MessageName,
                    MessageData = _context.MessageData.Where(msgData => msg.MessageId == msgData.MessageId).Select(msgData => new MessageDataDTO()
                    {
                        MessageData = msgData.MessageData,
                        MessageId = msgData.MessageId
                    }).SingleOrDefault(),
                    FaultId = fault.FaultId,
                }).ToList()
            }).OrderByDescending(fault => fault.InsertedDate).Take((int)count).ToListAsync<FaultDTO>();

            return faults;
        }

Also if someone could clarify if the query executed against the database at the end ('.ToListAsync();'), or is it executed partially three times at this stages: '.ToList()', '.SingleOrDefault()' and '.ToListAsync()?

As mentioned, the main focus is deferred execution. This being said, I will happily receive any suggestion for optimizing my code in terms of performance in general.

Solution

GetFaultsAsync will always get all Faults that ...

First of all, you put all Faults in a List and then throw away this information. if users (= software, not operators) want to know the number of Faults, they have to do a different query, or Count() all elements. One improvement would be to return an ICollection<Fault>, so people can Count. They can even Add / Remove Faults if they want.

Of course you could return IList<Fault>, but IMHO the index has no meaning:

IList<Fault> fetchedFaults = await GGetFaultsAsync(...)
Fault fault4 = fetchedFaults[4];

Number 4 would be quire meaningless. Therefor my advice would be to return ICollection<Fault>, or maybe IReadonlyCollection<Fault>, if you don't want people to add / remove items to the fetched data. On the other hand, if that means that people will copy the fetched data to a new list, why not allow them to change the originally fetched data?

Another improvement:

var fetchedFaults = await GGetFaultsAsync(...)
var faultsToProcess = fetchedFaults.Take(3);

What a waste to fetch all 10.000 faults, and then only use the first 3 of them!

When using LINQ it is wise to keep the return value IQueryable / IEnumerable as long as possible. Let the user of the query decide whether he want to add other LINQ statements or not and when he wants to materialize them: `ToList() / FirstOrDefault() / Count() / etc.

The most easy method to do this, would be to make an extension method that takes an IQueryable<Fault> as input, and returns IQueryable<FaultDto>. If you are not familiar with extension methods see Extension methods demystified

public static IQueryable<FaultDto> GetFaults(this IQueryable<Fault> faults,
    string? application, 
    DateTime? fromDate, DateTime? toDate)
{
    return faults.Where(fault => ...)
        .Select(fault => new FaultDto
        {
            FaultId = fault.FaultId,
            InsertedDate = fault.InsertedDate,
            ...
        });
}

Usage:

string? application = ...
DateTime? fromDate = ...
DateTime? toDate = ...
using (var dbContext = new MyDbContext())
{
    var result = await dbContext.Faults.GetFaults(application, fromDate, toDate)
        .GroupBy(fault => fault.MachineName)
        .Take(10)
        .ToListAsync();
    this.Process(result);
}

Advantages:

The user can concatenate other LINQ statements
you won't fetch more items than actually used.
the caller decides whether he wants to use async/await or not.
the caller decides on which sequence of Faults he wants to use

Example of the latter:

var result = await dbContext.Faults
    .Where(fault => fault.MachineName == "SPX100")
    .GetFaults(application, fromDate, toDate);

Disadvantage: The caller has to create the source of the Faults (the DbContext).

Repository

If desired, you can hide the source of the Faults: it can be a database that uses entity framework, but it could also be a CSV-file, or a Dictionary (for unit tests?), maybe a REST-call on the internet?

If you want this, make a "repository" class. All that users know is that you can put items in the Repository, and later fetch them again, even after a restart of the program. The repository hides that the items are accessed using entity framework.

If you only need to query items, create a repository with IQueryable<...> properties. If you want to Add / Remove items using this repository class, use ICollection<...> or IDbSet<...> properties. Be aware though, that the latter solution limits the possibilities to change the internals.

class Repository : IDisposable
{
    private readonly MyDbContext dbContext = new MyDbContext(...);

    public IQueryable<Fault> Faults => dbContext.Faults;
    public IQueryable<Message> Messages => dbContext.Messages;
    ...

    // Dispose() disposes the DbContext
}

Usage:

using (Repository repository = new Repository()
{
    var result = await repository.Faults.GetFaults(application, fromDate, toDate)
        .GroupBy(fault => fault.MachineName)
        .Take(10)
        .ToListAsync();
    this.Process(result);
}

Another advantage of using a Repository, you can give different users different Repositories: some users only want to query items, some need to Add / Remove / Change items, while only superusers need to Create or Change Tables.