Search code examples
c#.netlinqdatatable

An efficient way to convert from IEnumerable<T> to DataTable


Before this gets marked as duplicate I have seen many answers like this one Convert IEnumerable to DataTable and have attempted doing something similar in the way of creating an extension method. I ask my question as the issue I'm having may lie somewhere else.

Essentially I have quite a large IEnumerable<T> (around 16 - 17 million items) up to this point I have not really had any issues with this, until I tried converting this to a DataTable using an extension method:

/// <summary>
/// Converts IEnumerable to datatable. Mainly for use when using SQLBulkCopy/>
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="collection"></param>
/// <param name="customColumnOrder">Custom order for columns allows me to make sure that the order of columns will always be the same. Am open for suggestions for better ways to do this</param>
/// <returns></returns>
public static DataTable ToDataTable<T>(this IEnumerable<T> collection, List<Tuple<string, int, int>> customColumnOrder)
{
    DataTable dt = new DataTable();
    var type = collection.First().GetType();

    foreach (var column in customColumnOrder)
    {
        dt.Columns.Add(column.Item1, Nullable.GetUnderlyingType(type.GetProperty(column.Item1).PropertyType) ?? type.GetProperty(column.Item1).PropertyType);
    }

    // Populate the table
    foreach (T item in collection)
    {
        DataRow dr = dt.NewRow();
        dr.BeginEdit();

        foreach (var column in customColumnOrder)
        {
            dr[column.Item1] = type.GetProperty(column.Item1).GetValue(item) ?? DBNull.Value;
        }

        dr.EndEdit();
        dt.Rows.Add(dr);
    }

    return dt;
}

This works fine for smaller tables of around 100,000 items, but begins to really struggle when it gets into the millions. I just keep getting time outs. Is there a more efficient/generally better way to do convert from a IEnumerable<T> to a DataTable?

I am converting to DataTable so I can use SqlBulkCopy to get the data into a database.

EDIT: here's where the data is being passed through from

    /// <summary>
    /// SqlBulkCopy for saving large amounts of data
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="dataToSave"></param>
    /// <param name="modelManager">Custom manager to use alongside the model</param>
    /// <param name="conn">Connection string to DB</param>
    public void BatchSave<T>(IEnumerable<T> dataToSave, IData modelManager, string conn)
    {
        var model = dataToSave.First();

        using (SqlConnection sqlconn= new SqlConnection(conn))
        {
            sqlconn.Open();

            using (SqlCommand cmd = new SqlCommand(GetCreateScript(modelManager, model), sqlconn))
            {
                //Create temp table to do initial insert into
                cmd.ExecuteNonQuery();

                SqlBulkCopy copy = new SqlBulkCopy(cmd.Connection);

                copy.DestinationTableName = "#tempTableForImport";

                // Convert data to DataTable
                DataTable dt = dataToSave.ToDataTable(modelManager.GetDataColumnsOrder());

                // Copy to temp table
                copy.WriteToServer(dt);
            }

            using (SqlCommand cmd = new SqlCommand(modelManager.GetInsertSproc(), sqlconn) { CommandType=CommandType.StoredProcedure })
            {
                // Clean up data and move to final table
                cmd.ExecuteNonQuery();
            }

            sqlconn.Close();
        }
    }

EDIT #1: newly amended code using a suggestion made, its now using Fastmember:

public void BatchSave<T>(IEnumerable<T> dataToSave, IData modelManager, string conn)
{
    var model = dataToSave.First();

    using (SqlConnection sqlconn = new SqlConnection(conn))
    {
        sqlconn.Open();

        using (var bcp = new SqlBulkCopy(sqlconn))
        {
            using (var reader = ObjectReader.Create(dataToSave, modelManager.GetDataColumnsOrder().Select(s => s.Item1).ToArray() /*modelManager.GetDataColumnsOrder().Select(obj=>obj.Item1).ToString()*/))
            {
                using (SqlCommand cmd = new SqlCommand(GetCreateScript(modelManager, model), sqlconn))
                {
                    cmd.ExecuteNonQuery();
                    bcp.DestinationTableName = "#tempTableForImport";
                    bcp.WriteToServer(reader);
                }

                using (SqlCommand cmd = new SqlCommand(modelManager.GetInsertSproc(), sqlconn) { CommandType = CommandType.StoredProcedure })
                {
                    cmd.ExecuteNonQuery();
                }
            }
        }

        sqlconn.Close();
    }
}

This has sped things up, however I am still getting a "Timeout Expired" on this line bcp.WriteToServer(reader);.

Thanks all for the help so far after about 30 seconds, any more ideas on this? Maybe someway to increase length of time before timeout?


Solution

  • Instead of going through a DataTable, I would implement an IDataReader for your collection and feed that to SqlBulkCopy. If done correctly, and using a lazy IEnumerable, it would be much faster, and use much less memory than the datatable route. Mark Gravell has already written such a library for converting IEnumerables to an IDataReader, and I would recommend you check that out before rolling your own.

    FastMember can be found on NuGet here: https://www.nuget.org/packages/FastMember/ with the original source found here: https://code.google.com/p/fast-member/ with an example in this thread here: SqlBulkCopy from a List<>

    UPDATE: You may also need to change your command timeout, and set the batch size on the sqlbulkcopy, like this:

    using (SqlCommand cmd = new SqlCommand(modelManager.GetInsertSproc(), sqlconn) { 
      CommandType = CommandType.StoredProcedure, CommandTimeout=300 })
    

    and

    bcp.BatchSize = 100000;