I've run into some memory problems while using EF4.1, the problems mainly happens in this situation: Imagine that I have Students, that can attend one or more Courses, and multiple users can attend the same course. So, I have something like:
Student * < - > 1-* Course
Imagine that I have in my BD 2 students and 1 course. Like this:
Ana Attends English Course Bob Attends English Course
My Object Graph is something like this:
Ana
\
English Course
/
Bob
This is fine.
I save this and that's saving fine, two lines on the student table, and one on the courses table.
The problem is when I try to get this data.
When I do something like:
var students = (from s in students
select s).Include("Courses");
This is the resulting graph:
Ana -> English Course
Bob -> English Course
The object is duplicated. Imagine the situation when the depth of this tree is much bigger, and there are thousands of students and thousands of courses, and hundreds of student attending the same course.
The memory usage of this query would be huge, how to solve this reference problem?
If you use AsNoTracking
in your queries the objects are not loaded into the context and not cached there. But without the context you don't have Identity Mapping, which means: you don't have a unique mapping between key property values and object reference identities. As a consequence EF will create a new object for every loaded navigation property, so you'll get multiple objects for the same key. This makes loading faster because no identity map needs to be created and no property snapshot for change tracking, but it consumes potentially more memory.
As far as I know, without loading the data into the context there is no way to avoid the duplication of objects during object materialization.
To possibly improve the performance when loading the data without AsNoTracking
you could try to move from eager loading the courses collection to explicite loading. Eager loading is known to lead to huge multiplication of the data transfered between database and client which can have a very negative impact on the performance. Using explicite loading your code would look like this:
// no Include and no AsNoTracking here
var students = (from s in context.Students select s).ToList();
foreach (var student in students)
{
context.Entry(student).Collection(s => s.Courses).Load();
}
This creates one additional database query per loaded student
to load the Courses
collection. Regarding performance is sounds crazy to do this but there are examples that this can still be much faster than a single query with eager loading (as this example shows: Detect entities which have the same children (see comments to this answer: performance gain from 167 sec to 3.4 sec after moving from eager to explicite loading)).
The objects shouldn't be duplicated here because they get materialized into the context.