Search code examples
c#performancelinqdynamics-crmprocessing-efficiency

Most efficient way to search enumerable


I am writing a small program that takes in a .csv file as input with about 45k rows. I am trying to compare the contents of this file with the contents of a table on a database (SQL Server through dynamics CRM using Xrm.Sdk if it makes a difference).

In my current program (which takes about 25 minutes to compare - the file and database are the exact same here both 45k rows with no differences), I have all existing records on the database in a DataCollection<Entity> which inherits Collection<T> and IEnumerable<T>

In my code below I am filtering using the Where method and then doing a logic based the count of matches. The Where seems to be the bottleneck here. Is there a more efficient approach than this? I am by no means a LINQ expert.

foreach (var record in inputDataLines)
{
    var fields = record.Split(',');

    var fund = fields[0];
    var bps = Convert.ToDecimal(fields[1]);
    var withdrawalPct = Convert.ToDecimal(fields[2]);
    var percentile = Convert.ToInt32(fields[3]);
    var age = Convert.ToInt32(fields[4]);
    var bombOutTerm = Convert.ToDecimal(fields[5]);

    var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
                                      && Convert.ToDecimal(r["field_2"]) == bps
                                      && Convert.ToDecimal(r["field_3"]) == withdrawalPct
                                      && Convert.ToDecimal(r["field_4"]) == percentile
                                      && Convert.ToDecimal(r["field_5"]) == age);

    entitiesFound.AddRange(matchingRows);

    if (matchingRows.Count() == 0)
    {
        rowsToAdd.Add(record);
    }
    else if (matchingRows.Count() == 1)
    {
        if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
        {
            rowsToUpdate.Add(record);
            entitiesToUpdate.Add(matchingRows.First());
        }
    }
    else
    {
        entitiesToDelete.AddRange(matchingRows);
        rowsToAdd.Add(record);
    }
}

EDIT: I can confirm that all existingRecords are in memory before this code is executed. There is no IO or DB access in the above loop.


Solution

  • Himbrombeere is right, you should execute the query first and put the result into a collection before you use Any, Count, AddRange or whatever method will execute the query again. In your code it's possible that the query is executed 5 times in every loop iteration.

    Watch out for the term deferred execution in the documentation. If a method is implemented in that way, then it means that this method can be used to construct a LINQ query(so you can chain it with other methods and at the end you have a query). But only methods that don't use deferred execution like Count, Any, ToList(or a plain foreach) will actually execute it. If you dont want that the whole query is executed everytime and you have to access this query multiple times , it's better to store the result in a collection(.f.e with ToList).

    However, you could use a different approach which should be much more efficient, a Lookup<TKey, TValue> which is similar to a dictionary and can be used with an anonymous type as key:

    var lookup = existingRecords.Entities.ToLookup(r => new 
    {
        fund = r["field_1"].ToString(),
        bps = Convert.ToDecimal(r["field_2"]),
        withdrawalPct =  Convert.ToDecimal(r["field_3"]),
        percentile = Convert.ToDecimal(r["field_4"]),
        age = Convert.ToDecimal(r["field_5"])
    });
    

    Now you can access this lookup in the loop very efficiently.

    foreach (var record in inputDataLines)
    {
        var fields = record.Split(',');
        var fund = fields[0];
        var bps = Convert.ToDecimal(fields[1]);
        var withdrawalPct = Convert.ToDecimal(fields[2]);
        var percentile = Convert.ToInt32(fields[3]);
        var age = Convert.ToInt32(fields[4]);
        var bombOutTerm = Convert.ToDecimal(fields[5]);
    
        var matchingRows = lookup[new {fund, bps, withdrawalPct, percentile, age}].ToList();
    
        entitiesFound.AddRange(matchingRows);
    
        if (matchingRows.Count() == 0)
        {
            rowsToAdd.Add(record);
        }
        else if (matchingRows.Count() == 1)
        {
            if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
            {
                rowsToUpdate.Add(record);
                entitiesToUpdate.Add(matchingRows.First());
            }
        }
        else
        {
            entitiesToDelete.AddRange(matchingRows);
            rowsToAdd.Add(record);
        }
    }
    

    Note that this will work even if the key does not exist(an empty list is returned).