I'm using Csv Helper to write out a Linq Query with million of rows. I would like to split the output by, for instance, 1 million of rows each. Could I do that or should I use other type of writting method?
Here is my code:
var _path = UniversalVariables.outputCsvFiles + "entire_output.csv";
var pvQuery = from car in Cars
select car;
if (!Directory.Exists(UniversalVariables.outputCsvFiles))
{
Directory.CreateDirectory(UniversalVariables.outputCsvFiles);
}
using (var sw = new StreamWriter(_path))
using (var csv = new CsvWriter(sw))
{
csv.Configuration.Delimiter = UniversalVariables.csvDelimiter;
csv.Configuration.HasHeaderRecord = true;
csv.WriteHeader<Car>();
csv.NextRecord();
csv.WriteRecords(pvQuery);
sw.Flush();
}
You could use Linq to split the collection in to sub collections (chunks of size n). For example
pvQuery.Select((x,index)=>new {Value=x,Index=index})
.GroupBy(x=>(int)(x.Index/numberOfItemsPerGroup))
.Select(x=>x.Select(c=>c.Value));
Making it a Extension method
static class Extensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> source, int numberOfItemsPerGroup)
{
return source.Select((x,index)=>new {Value=x,Index=index})
.GroupBy(x=>(int)(x.Index/numberOfItemsPerGroup))
.Select(x=>x.Select(c=>c.Value));
}
}
Client code
SourceCollection.Split(numberOfItemsPerGroup);