Search code examples
c#performancelinqt-sqldatabase-performance

Compare multi values against multi values, maintaining performance


I have two forms of records or data. One being Inventory Record and the other a Product.

Both the Inventory Record and Product Record have Code values stored in separate tables. Each record has a value record and is foreign keyed to the main table.

I am trying NOT to write this code below because i know from a performance standpoint it will run for days.

 public void test(product _Product)
        {
            DBDataContext db = new DBDataContext();


            var ProductCodes = db.tbl_ProdCodeValues.Where(x=>x.productID == _Product.productID);

            foreach (var code in ProductCodes)
            {

                var matches = db.InventoryCodeValues.Any(x => x.InventoryValue.ToLower().Contains(code.ProdCodeValue));
            }

        }

What is the best way to compare a collection of values against another collection of values. Please let me know if you think a Task or Multi Threading Approach would also improve performance.

Each product could have a approx 1-1000 values and the Inventory will ALWAYS have over 1 million values and still to grow. So performance is key because the math will tell you there is a ton of comparisons being performed. I can also move the Query from LINQ to SQL to T-SQL Stored procedures if needed.


Solution

  • Your foreach is like a Select.

    var ProductCodes = db.tbl_ProdCodeValues.Where(x=>x.productID == _Product.productID);
    var withMatches =
     ProductCodes
     .Select(code => new {
      code,
      matches = db.InventoryCodeValues.Any(x => x.InventoryValue.ToLower().Contains(code.ProdCodeValue))
     });
    

    And now all of this remotes to the database. Look at the query plan to see whether this is acceptable already or whether we need to tune this. Because of that non-SARGable predicate this is probably a nasty cross product plus filter.

    What format are the InventoryCodeValues? Is this a delimited list? If you split the list into rows then you can use a simple == predicate and make this query run in linear time instead of quadratic time.