Search code examples
c#entity-frameworkcrud

Best way to do large CRUD operations


I'm working on an application where I need to get large amount of data from a database, do some data manipulation and then insert the data in another database.

However, I am struggling to find the best way to check if a record from the source db is already present in the destination db. If it is not present, I need to add a new record. If it is present, I need to update the existing record.

My code looks something like this (a simplified version):

List<Data> data_from_db = new();

var existingData = await _context.Data.ToListAsync();

var sourceData = await _dbAccess.LoadData<Data, dynamic>(storedProcedure: "sp_ApiGetAllData", new { }, "Db");

data_from_db = sourceData.ToList();

//loop through data. If the datapoint is already present in db, it will be updated. If not, it will be added as a new datapoint.
foreach (var datapoint in data_from_db)
{
    //check if price is already present in db.
    var existingDatapoint = existingData.Find(x => x.ItemId== datapoint.ItemId);

    if (existingDatapoint != null)
    {
        //Update datapoint

        _context.Entry(existingDatapoint).State = EntityState.Modified;
        
        processedIds.Add(existingDatapoint.Id);
    }

    else
    {
        //Create new datapoint
        _context.Data.Add(newDatapoint);
    }
}

await _context.SaveChangesAsync();

This works fine. However, when the DB has +400K rows, the process gets painfully slow. Specifically, it is the "Find" function which takes a lot of time. And of course this makes sense as it is going to do 400k * 400k searches in the list.

Is there a better way to handle this issue?

Update: The application has to do some complicated price calculations (which is why i simplified it in my original post). But to sum it up: I get prices, discount-info and min. amount from the source db, calculate the pricing, then insert the calculated prices in to the destination db.

The tricky part comes when i want to check if the price-info is already present, because a individual price is a combination of Itemnumber, discount group and min. amount.

So in reality the find statement looks like this:

var existingPrice = existingPrices.Find(x => x.ItemNumber == priceEntry.ItemNumber && x.DiscountGroupId == priceEntry.DiscountGroupId && x.MinAmount == priceEntry.MinAmount);

None of the above 3 paramenters is enough by it self, to identify a price. An item can have alot of different prices base on discount groups, but each discountgroup can also several prices base on how many of a given product is ordered (min amount).


Solution

  • Using a ValueTuple, you can create a Dictionary to map the three identifiers of a price to the existing price in the database your are updating. The C# compiler automatically does equality for a ValueTuple based on the values in all of its items, including hash value. Then you can just lookup the existing price, or create a new one if it isn't found:

    var existingPrices = await _context.Prices.ToListAsync();
    
    var sourcePrices = await _dbAccess.LoadData<Prices, dynamic>(storedProcedure: "sp_ApiGetAllData", new { }, "Db");
    
    var existingPriceMap = existingPrices.ToDictionary(p => (p.ItemNumber, p.DiscountGroupId, p.MinAmount));
    //loop through data. If the datapoint is already present in db, it will be updated. If not, it will be added as a new datapoint.
    foreach (var priceEntry in sourcePrices)
    {
        //check if price is already present in db.
        if (existingPriceMap.TryGetValue((priceEntry.ItemNumber, priceEntry.DiscountGroupId, priceEntry.MinAmount), out var existingPrice))
        {
            //Update datapoint
            _context.Entry(existingPrice).State = EntityState.Modified;
    
            processedIds.Add(existingPrice.Id);
        }
        else
        {
            //Create new datapoint
            _context.Data.Add(newDatapoint);
        }
    }
    
    await _context.SaveChangesAsync();