Search code examples
c#listcollectionsbatch-processinggeneric-collections

Remove Multiple Elements From List<T>


I was wondering, is there an elegant way to remove multiple items from a generic collection (in my case, a List<T>) without doing something such as specifying a predicate in a LINQ query to find the items to delete?

I'm doing a bit of batch processing, in which I'm filling a List<T> with Record object types that need to be processed. This processing concludes with each object being inserted into a database. Instead of building the list, and then looping through each individual member and processing/inserting it, I want to perform transactional bulk inserts with groups of N items from the list because it's less resource intensive (where N represents the BatchSize that I can put in a config file, or equivalent).

I'm looking to do something like:

public void ProcessRecords()
{
    // list of Records will be a collection of List<Record>
    var listOfRecords = GetListOfRecordsFromDb( _connectionString );
    var batchSize = Convert.ToInt32( ConfigurationManager.AppSettings["BatchSize"] );

    do
    {
       var recordSubset = listOfRecords.Take(batchSize);
       DoProcessingStuffThatHappensBeforeInsert( recordSubset );

       InsertBatchOfRecords( recordSubset );

       // now I want to remove the objects added to recordSubset from the original list
       // the size of listOfRecords afterwards should be listOfRecords.Count - batchSize
    } while( listOfRecords.Any() )
}

I'm looking for a way to do this all at once, instead of iterating through the subset and removing the items that way, such as:

foreach(Record rec in recordSubset)
{
   if( listOfRecords.Contains(rec) ) 
   { 
      listOfRecords.Remove(rec);
   }
}

I was looking at using List.RemoveRange( batchSize ), but wanted to get some StackOverflow feedback first :) What methods do you use to maximize the efficiency of your batch processing algorithms in C#?

Any help/suggestions/hints are much appreciated!


Solution

  • With extension method

    public static IEnumerable<List<T>> ToBatches<T>(this List<T> list, int batchSize)
    {
        int index = 0;
        List<T> batch = new List<T>(batchSize);
    
        foreach (T item in list)
        {
            batch.Add(item);    
            index++;
    
            if (index == batchSize)
            {
                index = 0;                
                yield return batch;
                batch = new List<T>(batchSize);
            }
        }
    
        yield return batch;
    }
    

    You can split input sequence into batches:

    foreach(var batch in listOfRecords.ToBatches(batchSize))
    {
       DoProcessingStuffThatHappensBeforeInsert(batch);
       InsertBatchOfRecords(batch);
    }