I am looping through a collection (DataTable) where each item has a "group number" and building a dictionary where each key corresponds to the group number. There's about a million records in the DataTable and I am running into memory issues when this runs. I was wondering if I could get some feedback on how to make this perform better. I think maybe one issue is where I am creating a new tempList
on each iteration.
I really appreciate any feedback here.
Dictionary<int, IEnumerable<Item>> itemGroups = new Dictionary<int, IEnumerable<Item>>();
foreach (DataRow row in dtItems.Rows)
{
Item item = new Item(row["ID"].ToString(),
row["Name"].ToString(),
row["Description"].ToString());
Int32.TryParse(row["GroupNum"].ToString(), out int groupNum);
if (!itemGroups.ContainsKey(groupNum))
{
List<Item> itemList = new List<Item>();
itemList.Add(item);
itemGroups.Add(groupNum, itemList);
}
else
{
var tempList = itemGroups[groupNum].ToList();
tempList.Add(item);
itemGroups[groupNum] = tempList;
}
}
You can simplify it by using LINQ. The GroupBy
extension method is already optimized for aggregating items.
Dictionary<int, List<Item>> itemGroups = dtItems.Rows.Cast<DataRow>()
.GroupBy(row => (int)row["GroupNum"])
.ToDictionary(g => g.Key,
g => g.Select(row => new Item(
row["ID"].ToString(),
row["Name"].ToString(),
row["Description"].ToString())
).ToList());
As other have already pointed out, don't convert values if you can cast them. If the GroupNum
column contains an int
, then simply cast the value to int
. However, if a column is nullable, then it can contain a DBNull.Value
. Let's assume that ID
and Name
are NOT NULL columns but that the Description
is optional and therefore a NULL column. Then you could optimize the creation of an item like this:
g => g.Select(row => new Item(
(string)row["ID"],
(string)row["Name"],
row["Description"].ToString())
Note also that an IEnumerable<T>
queries the source every time it is enumerated. An IEnumerable<T>
is not a collection, but a way to iterate over a data source. We could have dropped the .ToList()
and typed the dictionary values as IEnumerable<Item>
instead. But with the consequence that enumerating the dictionary values would have executed the whole Select
including the new Item(...)
every time.