Search code examples
c#.netlinqparallel-extensions

Using Parallel Extensions or Parallel LINQ with LINQ Take


I have a database with about 5 million rows in it. I am trying to generate XML strings for the database and push them to a service. Instead of doing this one at a time, the service supports taking 1000 records at a time. At the moment, this is quite slow, taking upwards of 10 seconds per 1000 records (including writing back to the database and uploading to the service).

I tried to get the following code working, but have failed... I get a crash when I try it. Any ideas?

    var data = <insert LINQ query here>
    int take = 1000
    int left = data.Count();

    Parallel.For(0, left / 1000, i =>
        {
            data.Skip(i*1000).Take(1000)...
            //Generate XML here.
            //Write to service here...
            //Mark items in database as generated.
        });
        //Get companies which are still marked as not generated.
        //Create XML.
        //Write to Service.

I get a crash telling me that the index is out of bounds. If left is 5 million, the number in the loop should be no more than 5000. If I multiply that again by 1000, I should not get more than 5 million. I wouldn't mind if it worked for a bit, and then failed, but it just fails after the SQL query!


Solution

  • I suspect the index out of bounds error is caused by code other than what is currently being displayed.

    That being said, this could be handled in a much cleaner manner. Instead of using this approach, you should consider switching to using a custom partitioner. This will be dramatically more efficient, as each call to Skip/Take is going to force a re-evaluation of your sequence.