Search code examples
c#.netlinqcsvhelper

CsvHelper - Split output files


I'm using Csv Helper to write out a Linq Query with million of rows. I would like to split the output by, for instance, 1 million of rows each. Could I do that or should I use other type of writting method?

Here is my code:

var _path = UniversalVariables.outputCsvFiles + "entire_output.csv"; 

var pvQuery = from car in Cars 
              select car;


if (!Directory.Exists(UniversalVariables.outputCsvFiles))
{
    Directory.CreateDirectory(UniversalVariables.outputCsvFiles);
}

using (var sw = new StreamWriter(_path))
using (var csv = new CsvWriter(sw))
{
    csv.Configuration.Delimiter = UniversalVariables.csvDelimiter;
    csv.Configuration.HasHeaderRecord = true;

    csv.WriteHeader<Car>();
    csv.NextRecord();
    csv.WriteRecords(pvQuery);

    sw.Flush();
}

Solution

  • You could use Linq to split the collection in to sub collections (chunks of size n). For example

    pvQuery.Select((x,index)=>new {Value=x,Index=index})
                  .GroupBy(x=>(int)(x.Index/numberOfItemsPerGroup))
                  .Select(x=>x.Select(c=>c.Value));
    

    Making it a Extension method

    static class Extensions
    {
        public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> source, int numberOfItemsPerGroup)
        {
            return source.Select((x,index)=>new {Value=x,Index=index})
                  .GroupBy(x=>(int)(x.Index/numberOfItemsPerGroup))
                  .Select(x=>x.Select(c=>c.Value));
    
        }
    }
    

    Client code

    SourceCollection.Split(numberOfItemsPerGroup);