I currently have a service in C# that uses dapper to call a stored procedure that does 2 things: if the customer exists, it grabs the customer GUID
and adds it to the CustomerInformations
table; if the customer doesn't exist, then it inserts the customer and then returns the GUID
and adds it to the CustomerInformations
table.
Previously, inserts were taking around 1.75 million records an hour. Now it can barely get 200,000 records an hour. There's around 75 million records in my CustomerInformations
table and I am looking to fixing the bottleneck.
For each Customer property, it's iteratively calling a stored procedure. Each stored procedure call could have 2 inserts into the database. First, adding the customer into the Customers
table and then adding the property into the CustomerInformations
table. I understand that this may not be the most ideal way to store the data but it's not something I can change.
C# service
foreach (var info request.Data)
{
string sql = "add_one_by_customer";
object parameters = new
{
p_customer_first_name = info.FirstName,
p_customer_last_name = info.LastName,
p_customer_property_name = info.PropertyName,
p_customer_property_value = info.PropertyValue
};
try
{
await db.ExecuteAsync(sql, parameters, transaction: transaction, commandType: CommandType.StoredProcedure);
}
catch (Exception e)
{
throw new Exception($"Failed to insert");
}
}
Postgres stored procedure:
CREATE OR REPLACE PROCEDURE add_one_by_customer(
p_customer_first_name VARCHAR,
p_customer_last_name VARCHAR,
p_customer_property_name VARCHAR,
p_customer_property_value VARCHAR,
)
LANGUAGE plpgsql
AS $procedure$
DECLARE p_customer_id uuid;
p_current_item_value varchar;
begin
SELECT INTO p_customer_id,
customer_id
FROM customers
WHERE customer_first_name = p_customer_first_name AND
customer_last_name = p_customer_last_name
limit 1;
IF (p_customer_id IS NULL) THEN
begin
INSERT INTO customers(customer_first_name, customer_last_name)
VALUES (p_customer_first_name, p_customer_last_name) RETURNING customer_id into p_customer_id;
EXCEPTION WHEN unique_violation THEN
p_customer_id = (SELECT custmomer_id
FROM customers
WHERE customer_first_name = p_customer_first_name AND
customer_last_name = p_customer_last_name
END;
end if;
p_current_item_value := (select property_value
from customer_informations
where customer_id = p_customer_id AND
customer_property_name = p_customer_property_name);
if (p_current_item_value is NULL) THEN
INSERT INTO customer_informations(customer_id, customer_property_name, customer_property_value)
VALUES (p_customer_id, p_customer_property_name, p_customer_property_value);
elseif (p_current_item_value is not null AND p_current_item_value != p_item_value) then
UPDATE customer_informations
SET customer_property_value = p_current_item_value
WHERE customer_id = p_customer_id ;
end if;
end; $procedure$;
Currently my CustomerInformations
table has a unique constraint on Customer_Id, Customer_property_name
.
Things I've tried to enhance:
Any tips or suggestions would be greatly appreciated.
Customer_information unique constraint:
CONSTRAINT ux_customer_informations UNIQUE (customer_id, customer_property_name)
Customers unique constraint:
CONSTRAINT ux_customers UNIQUE (customer_firstname, customer_lastname)
Your current procedure is hugely inefficient. See:
Avoid a nested code block with error handling, that's very expensive. Can be done properly with the "SELECT or INSERT" technique I use below. See:
The second part is an UPSERT in disguise. Also much cheaper now:
CREATE OR REPLACE PROCEDURE dd_one_by_customer(
p_customer_first_name text
, p_customer_last_name text
, p_customer_property_name text
, p_customer_property_value text
)
LANGUAGE plpgsql AS
$proc$
DECLARE
p_customer_id uuid;
p_current_item_value text;
BEGIN
LOOP
SELECT customer_id
FROM customers
WHERE customer_first_name = p_customer_first_name
AND customer_last_name = p_customer_last_name
INTO p_customer_id;
EXIT WHEN FOUND;
INSERT INTO customers
( customer_first_name, customer_last_name)
VALUES (p_customer_first_name, p_customer_last_name)
ON CONFLICT (customer_first_name, customer_last_name) DO NOTHING
RETURNING customer_id
INTO p_customer_id;
EXIT WHEN FOUND;
END LOOP;
INSERT INTO customer_informations
( customer_id, customer_property_name, customer_property_value)
VALUES (p_customer_id, p_customer_property_name, p_customer_property_value)
ON CONFLICT (customer_id, customer_property_name) DO UPDATE
SET customer_property_value = EXCLUDED.customer_property_value
WHERE customer_property_value IS DISTINCT FROM p_current_item_value;
END
$proc$;
This requires a UNIQUE
constraint on each of the two tables - exactly the ones you declared (ux_customer_informations
& ux_customers
).
If neither customer_property_value
nor p_current_item_value
can be null
, simplify the final WHERE
clause to:
...
WHERE customer_property_value <> p_current_item_value;