Search code examples
c#.netentity-frameworklinqentity-framework-core

EF Core: Correct way to query data multiple levels deep in related one-to-many entities


I'm trying learn to write efficient Entity Framework queries when data has to be fetched based on multiple joins, including a many-to-many via a junction table. In the following example, I'd like to fetch all States that contain a particular Book.

Let's use a model with the following tables/entities, all linked by navigation properties: State, City, Library, Book, LibraryBook (junction table for many-to-many relationship between library and book.)

  • Each State has 1 or more Cities
  • Each City has 1 or more Libraries
  • Each Library has many Books & Each Book may exist at more than 1 library.

How can I best return all of the States that contain a particular Book? I'm inclined to think separate queries may work better than 1 large one, but I'm not certain what the best implementation is. I think that getting the LibraryId from the many-to-many relation first in a separate query is probably a good way to start.

So for that:

var bookId = 12;
var libraryIds = _context.LibraryBook.Where(l => l.BookId == bookId).Select(s => s.LibraryId);

If that comes first, I'm uncertain how to best query the next data in order to get the cities which contain each of those LibraryIds. I could use a foreach:

var cities = new List<City>;
foreach(var libraryId in libraryIds)
{
    var city = _context.City.Where(c => c.Library = libraryId)
    cities.Add(city);
}

But then I'd have to do yet another foreach for the states that contain the city, and this all adds up to a lot of separate SQL queries!

Is this really the only way to go about this? If not, what is a better alternative?

Thanks in advance!


Solution

  • Database management systems are extremely optimized in combining tables and selecting columns from the result. The transport of the selected data is the slower part.

    Hence it is usually better to limit the data that needs to be transported: let the DBMS do all the joining and selecting.

    For this, you don't need to put everything in one big LINQ statement that is hard to understand (and thus hard to test, reuse, maintain). As long as your LINQ statements remain IQuerayble<...>, the query is not executed. Concatenating several of these LINQ statements is not costly.

    Back to your question

    If you followed the entity framework conventions, your one-to-many relations and your many-to-many will have resulted in classes similar to the following:

    class State
    {
        public int Id {get; set;}
        public string Name {get; set;}
        ...
    
        // every State has zero or more Cities (one-to-many)
        public virtual ICollection<City> Cities {get; set;}
    }
    
    class City
    {
        public int Id {get; set;}
        public string Name {get; set;}
        ...
    
        // Every City is a City in exactly one State, using foreign key:
        public int StateId {get; set;}
        public virtual State State {get; set;}
    
        // every City has zero or more Libraries (one-to-many)
        public virtual ICollection<Library> Libraries {get; set;}
    }
    

    Library and Books: many-to-many:

    class Library
    {
        public int Id {get; set;}
        public string Name {get; set;}
        ...
    
        // Every Library is a Library in exactly one City, using foreign key:
        public int CityId {get; set;}
        public virtual City City {get; set;}
    
        // every Library has zero or more Books (many-to-many)
        public virtual ICollection<Book> Books {get; set;}
    }
    
    class Book
    {
        public int Id {get; set;}
        public string Title {get; set;}
        ...
    
        // Every Book is a Book in zero or more Libraries (many-to-many)
        public virtual ICollection<Library> Libraries {get; set;}
    }
    

    This is all that entity framework needs to know to recognize your tables, the columns in the tables and the relations between the tables.

    You will only need attributes or fluent API if you want to deviate from the conventions: different identifiers for columns or tables, non-default types for decimals, non default behaviour for cascade on delete, etc.

    In entity framework, the columns in the tables are represented by the non-virtual properties; the virtual properties represent the relations between the tables.

    The foreign key is an actual column in the table, hence it is non-virtual. The one-to-many has virtual ICollection<Type> on the "one" side and virtual Type on the "many" side. The many-to-many has virtual ICollection<...> on both sides.

    There is no need to specify the junction table. Entity framework recognizes the many-to-many and creates the junction table for you. If you use database first, you might need to use fluent API to specify the junction table.

    But how am I supposed to do the joins without a junction table?

    Answer: don't do the (group-)joins yourself, use the virtual ICollections!

    How can I best return all of the States that contain a particular Book?

    int bookId = ...
    var statesWithThisBookId = dbContext.States
        .Where(state => state.Cities.SelectMany(city => city.Libraries)
                                    .SelectMany(library => library.Books)
                                    .Select(book => book.Id)
                                    .Contains(bookId);
    

    In words: you have a lot of States. From every State, get all Books that are in all Libraries that are in all Cities in this State. Use SelectMany to make this one big sequence of Books. From every Book Select the Id. The result is one big sequence of BookIds (of Books that are in Libraries that are in Cities that are in the State). Keep only those States that have at least one Book with Id equal to BookId.

    Room for Optimization

    If you regularly need to "Get books of all libraries", to be used in similar questions, like: "Give me all States that have a Book from a certain Author", or "Give me all Libraries that have a Book with a certain title", consider to create extension methods for this. This way you can concatenate them as any LINQ method. The extension method creates the query, it will not execute them, so this won't be a performance penalty.

    Advantages of the extension method: simpler to understand, reusable, easier to test and easier to change.

    If you are not familiar with extension methods, read Extension Methods Demystified

    // you need to convert them to IQueryable with the AsQueryable() method, if not
    // you get an error since the receiver asks for an IQueryable
    // and a ICollection was given
    public static IQueryable<Book> GetBooks(this IQueryable<Library> libraries)
    {
        return libraries.SelectMany(library => library.AsQueryable().Books);
    }
    
    public static IQueryable<Book> GetBooks(this IQueryable<City> cities)
    {
        return cities.SelectMany(city => city.Libraries.AsQueryable().GetBooks());
    }
    

    Usage:

    Get all states that have a book by Karl Marx:

    string author = "Karl Marx";
    var statesWithCommunistBooks = dbContext.States.
        .Where(state => state.GetBooks()
                             .Select(book => book.Author)
                             .Contains(author));
    

    Get all Cities without a bible:

    string title = "Bible";
    var citiesWithoutBibles = dbContext.Cities
        .Where(city => !city.GetBooks()
                           .Select(book => book.Title)
                           .Contains(title));
    

    Because you extended your classes with method GetBooks(), it is as if States and Cities have Books. You've seen the reusability above. Changes can be easy, if for instance you extend your database such, that Cities have BookStores. GetBooks can check the libraries and the BookStores. Your change will be in one place. Users of GetBooks(), won't have to change.