I'm working on an application where I need to get large amount of data from a database, do some data manipulation and then insert the data in another database.
However, I am struggling to find the best way to check if a record from the source db is already present in the destination db. If it is not present, I need to add a new record. If it is present, I need to update the existing record.
My code looks something like this (a simplified version):
List<Data> data_from_db = new();
var existingData = await _context.Data.ToListAsync();
var sourceData = await _dbAccess.LoadData<Data, dynamic>(storedProcedure: "sp_ApiGetAllData", new { }, "Db");
data_from_db = sourceData.ToList();
//loop through data. If the datapoint is already present in db, it will be updated. If not, it will be added as a new datapoint.
foreach (var datapoint in data_from_db)
{
//check if price is already present in db.
var existingDatapoint = existingData.Find(x => x.ItemId== datapoint.ItemId);
if (existingDatapoint != null)
{
//Update datapoint
_context.Entry(existingDatapoint).State = EntityState.Modified;
processedIds.Add(existingDatapoint.Id);
}
else
{
//Create new datapoint
_context.Data.Add(newDatapoint);
}
}
await _context.SaveChangesAsync();
This works fine. However, when the DB has +400K rows, the process gets painfully slow. Specifically, it is the "Find" function which takes a lot of time. And of course this makes sense as it is going to do 400k * 400k searches in the list.
Is there a better way to handle this issue?
Update: The application has to do some complicated price calculations (which is why i simplified it in my original post). But to sum it up: I get prices, discount-info and min. amount from the source db, calculate the pricing, then insert the calculated prices in to the destination db.
The tricky part comes when i want to check if the price-info is already present, because a individual price is a combination of Itemnumber, discount group and min. amount.
So in reality the find statement looks like this:
var existingPrice = existingPrices.Find(x => x.ItemNumber == priceEntry.ItemNumber && x.DiscountGroupId == priceEntry.DiscountGroupId && x.MinAmount == priceEntry.MinAmount);
None of the above 3 paramenters is enough by it self, to identify a price. An item can have alot of different prices base on discount groups, but each discountgroup can also several prices base on how many of a given product is ordered (min amount).
Using a ValueTuple
, you can create a Dictionary
to map the three identifiers of a price to the existing price in the database your are updating. The C# compiler automatically does equality for a ValueTuple
based on the values in all of its items, including hash value. Then you can just lookup the existing price, or create a new one if it isn't found:
var existingPrices = await _context.Prices.ToListAsync();
var sourcePrices = await _dbAccess.LoadData<Prices, dynamic>(storedProcedure: "sp_ApiGetAllData", new { }, "Db");
var existingPriceMap = existingPrices.ToDictionary(p => (p.ItemNumber, p.DiscountGroupId, p.MinAmount));
//loop through data. If the datapoint is already present in db, it will be updated. If not, it will be added as a new datapoint.
foreach (var priceEntry in sourcePrices)
{
//check if price is already present in db.
if (existingPriceMap.TryGetValue((priceEntry.ItemNumber, priceEntry.DiscountGroupId, priceEntry.MinAmount), out var existingPrice))
{
//Update datapoint
_context.Entry(existingPrice).State = EntityState.Modified;
processedIds.Add(existingPrice.Id);
}
else
{
//Create new datapoint
_context.Data.Add(newDatapoint);
}
}
await _context.SaveChangesAsync();