DbQuery behaves differently in a foreach loop. Why?

If I use the following code, I get a list of students that study course 1 and course 2. (This is almost what I want.)

IQueryable<Student> filteredStudents = context.Students;
filteredStudents = filteredStudents
    .Where(s => s.Courses.Select(c => c.CourseID).Contains(1));
filteredStudents = filteredStudents
    .Where(s => s.Courses.Select(c => c.CourseID).Contains(2));
List<Student> studentList = filteredStudents.ToList<Student>();

However, if I try and do this in a foreach loop (as shown in the following code) then I get a list of all students that are signed up for the last course in the loop.

IQueryable<Student> filteredStudents = context.Students;
foreach (Course course in filter.Courses) {
    if (course != null) {             
        filteredStudents = filteredStudents
            .Where(s => s.Courses.Select(c => c.CourseID).Contains(course.CourseID));
    }
}
List<Student> studentList = filteredStudents.ToList<Student>();

This behaviour confuses me. Can anyone explain why this is happening? And how to get around it? Thank you.

Solution

The problem is that the foreach loop only creates a single course variable for all of the loop iterations, and this single variable is then captured into a closure. Also remember that the filters aren't actually executed until after the loop. Put those together, and by the time the filters execute this single course variable has advanced to the last item in the Courses filter; you only check against that one last course.

I see four ways to fix the problem.

First

Create a new variable for each iteration of the loop (this is probably your best quick fix)

IQueryable<Student> filteredStudents = context.Students;
foreach (Course course in filter.Courses) {
    if (course != null) {  
        int CourseID = course.CourseID;            
        filteredStudents = filteredStudents
            .Where(s => s.Courses.Select(c => c.CourseID).Contains(CourseID));
    }
}
List<Student> studentList = filteredStudents.ToList<Student>();

Second

Resolve the IEnumerable expression inside the loop (probably much less efficient):

IEnumerable<Student> filteredStudents = context.Students;
foreach (Course course in filter.Courses) {
    if (course != null) {             
        filteredStudents = filteredStudents
            .Where(s => s.Courses.Select(c => c.CourseID).Contains(course.CourseID))
            .ToList(); 
    }
}
List<Student> studentList = filteredStudents.ToList<Student>();

Third

Use more appropriate linq operators/lambda expressions to eliminate the foreach loop:

var studentList = context.Students.Where(s => s.Courses.Select(c => c.CourseID).Intersect(filter.Courses.Select(c => c.CourseID)).Any()).ToList();

Or in a more readable way:

IQueryable<Student> filteredStudents = context.Students;
var courses = filter.Courses.Select(c => c.CourseID);
var studentList = filteredStudents
       .Where(s => s.Courses.Select(c => c.CourseID)
                       .Intersect(courses)
                       .Any()
       ).ToList();

If you play with this a bit, the performance should meet or far exceed the foreach loop through clever internal use of HashSets or — if you're very lucky — by sending a JOIN query to the DB). Just be careful, because it's easy to write something here that make numerous "extra" calls out to the DB inside that Intersect() or Any() method. Even so, this is the option I tend to prefer, with the exception that I probably wouldn't bother to call .ToList() at the end.

This also illustrates why I don't have much use for ORMs like Entity Framework, linq-to-sql, or even NHibernate or ActiveRecord. If I'm just writing SQL, I can know I'm getting the correct join query. I could do that with an ORM, too, but now I still need to know about the specific SQL I'm creating, and I also have to know how to get the ORM to do what I want.

Fourth

Use C# 5.0. This is fixed in the most recent version of C#, so that each iteration of a for/foreach loop is its own variable.