Search code examples
c#linqc#-3.0ienumerable

What's a clean way to break up a DataTable into chunks of a fixed size with Linq?


Update: Here's a similar question


Suppose I have a DataTable with a few thousand DataRows in it.

I'd like to break up the table into chunks of smaller rows for processing.

I thought C#3's improved ability to work with data might help.

This is the skeleton I have so far:

DataTable Table = GetTonsOfData();

// Chunks should be any IEnumerable<Chunk> type
var Chunks = ChunkifyTableIntoSmallerChunksSomehow; // ** help here! **

foreach(var Chunk in Chunks)
{
   // Chunk should be any IEnumerable<DataRow> type
   ProcessChunk(Chunk);
}

Any suggestions on what should replace ChunkifyTableIntoSmallerChunksSomehow?

I'm really interested in how someone would do this with access C#3 tools. If attempting to apply these tools is inappropriate, please explain!


Update 3 (revised chunking as I really want tables, not ienumerables; going with an extension method--thanks Jacob):

Final implementation:

Extension method to handle the chunking:

public static class HarenExtensions
{
    public static IEnumerable<DataTable> Chunkify(this DataTable table, int chunkSize)
    {
        for (int i = 0; i < table.Rows.Count; i += chunkSize)
        {
            DataTable Chunk = table.Clone();

            foreach (DataRow Row in table.Select().Skip(i).Take(chunkSize))
            {
                Chunk.ImportRow(Row);
            }

            yield return Chunk;
        }
    }
}

Example consumer of that extension method, with sample output from an ad hoc test:

class Program
{
    static void Main(string[] args)
    {
        DataTable Table = GetTonsOfData();

        foreach (DataTable Chunk in Table.Chunkify(100))
        {
            Console.WriteLine("{0} - {1}", Chunk.Rows[0][0], Chunk.Rows[Chunk.Rows.Count - 1][0]);
        }

        Console.ReadLine();
    }

    static DataTable GetTonsOfData()
    {
        DataTable Table = new DataTable();
        Table.Columns.Add(new DataColumn());

        for (int i = 0; i < 1000; i++)
        {
            DataRow Row = Table.NewRow();
            Row[0] = i;

            Table.Rows.Add(Row);
        }

        return Table;
    }
}

Solution

  • This seems like an ideal use-case for Linq's Skip and Take methods, depending on what you want to achieve with the chunking. This is completely untested, never entered in an IDE code, but your method might look something like this.

    private List<List<DataRow>> ChunkifyTable(DataTable table, int chunkSize)
    {
        List<List<DataRow>> chunks = new List<List<DataRow>>();
        for (int i = 0; i < table.Rows.Count / chunkSize; i++)
        {
            chunks.Add(table.Rows.Skip(i * chunkSize).Take(chunkSize).ToList());
        }
        
        return chunks;
    }