Search code examples
c#multithreadinglinqlinq-to-sqldeferred-execution

How to use Linq's .Count() method in a conditional statement


I have a thread that periodically checks one of my MS SQL tables for any records that have their "Processed" bit field set to 0. The thread then performs some code using those records and then sets their Processed bits to 1; using it basically as a queue. The Linq query that I'm using to retrieve those records spans multiple lines and is quite complicated (for reasons unimportant to the question), so here is a very simplified version:

var RecordsToProcess = MyTable.Where(i => i.Processed == 0); // Very simplified

I need to wait until all the records have been processed before continuing, so I would like to use something like this:

while (RecordsToProcess.Count() > 0)
{
    System.Threading.Thread.Sleep(1000);
}

The problem is that while the thread does in fact process the records and sets their Processed bit to 1, the value of RecordsToProcess.Count() in the condition statement never decreases and thus we get an infinite loop. My guess is that calling .Count() stores that integer in memory, and then each iteration of the loop looks at that value instead of querying the database to get the current count. I can get the behavior I want by moving the query into the condition statement like so:

while (true)
{
    if (MyTable.Where(i => i.Processed == 0).Count() > 0)
        System.Threading.Thread.Sleep(1000);
    else
        break;
}

Since the query I'm actually using is much more complicated than the one in this example, doing it this way makes it hard to read. Is there something I can use that's similar to RecordsToProcess.Count() > 0, but that queries that database each iteration instead of using the initial count that was stored in memory (assuming I'm correct on that)?

Note: I wouldn't normally use a potentially dangerous while loop like that, but I'll only need to run this page a maximum of 4 or 5 times and then never again. So I'm not too worried about it.


Solution

  • Edited original post based on the comments.

    I believe part of the issue is how the compiler optimizes loops.

    It is likely something in your query that is caching data. If the whole query used lazy evaluation, except the Count being checked in the loop, every time you called Count on the query it would be re-evaluated. In your second example, the entire query is in the loop and thus has to be re-evaluated each time, regardless of whether or not is actually uses lazy evaluation. I would check the remarks in the MSDN documention on the operators you're using.

    I would also suggest using Any instead of Count in this situation for performance and clarity. Depending on what you're iterating, Count will usually iterate over a collection to see how many elements, but Any is lazier. In LINQ to Object, Count() is optimized for sequences implementing ICollection, to use the Count property, which is much faster than iterating, and Any() stops checking after it finds 1 element. As suggested by Erik below, In LINQ to SQL there is likely something like a TOP 1 added to the SELECT statement. I would assume SQL has it's own COUNT optimization, but I haven't done any research.

    Using Any() when appropriate can also help readability by getting rid of the operator in Count() > 0, and more clearly expresses that you're interested in a bool not an int.

    I would implement your method like this:

    var query = MyTable.Where(i => i.Processed == 0);
    while(true) {
        if (!query.Any()) break;
        Thread.Sleep(1000);
    }
    

    or better yet this, if you can get it to execute lazily:

    var query = MyTable.Where(i => i.Processed == 0);
    while(query.Any()) { Thread.Sleep(1000); }
    

    However, as mentioned by other answers, more information about how your query is constructed would be helpful.