it is my first time to create a stored procedure in mysql.
What it does is,
first- get count of all records
second- loop through that table 1 by 1
third- compare each entry if it is a duplicate
fourth- insert duplicate in a temporary table
last- display duplicates
It is working properly on 100-200 entries BUT on bigger records up to 500+ (sometimes 25k) it throws a message
Error Code: 1172. Result consisted of more than one row
I have googled this issue but none of them (answers) help me to solve my problem.
Please take a look on my script
BEGIN
DECLARE n INT DEFAULT 0;
DECLARE i INT DEFAULT 0;
DECLARE i_sku VARCHAR(255);
DECLARE i_concatenated_attributes MEDIUMTEXT;
DECLARE f_sku VARCHAR(255);
DECLARE f_offer_type VARCHAR(255);
DECLARE f_name VARCHAR(255);
DECLARE f_product_owner VARCHAR(255);
DECLARE f_listing_city VARCHAR(255);
DECLARE f_listing_area VARCHAR(255);
DECLARE f_price DOUBLE;
DECLARE f_bedrooms INT;
DECLARE f_building_size INT;
DECLARE f_land_size INT;
DECLARE f_concatenated_attributes MEDIUMTEXT;
DECLARE f_duplicate_percentage INT;
SELECT COUNT(*) FROM unit_temp_listing INTO n;
CREATE TEMPORARY TABLE IF NOT EXISTS temp_temp (dup_sku VARCHAR(255), dup_percentage INT, attribs MEDIUMTEXT);
SET i=0;
WHILE i<n DO
-- Get all unit listings (one by one)
SELECT
sku, concat_ws(',',offer_type,name,product_owner,listing_city,listing_area,price,ifnull(bedrooms,0),ifnull(building_size,0),ifnull(land_size,0)) as concatenated_attributes
INTO i_sku, i_concatenated_attributes
FROM unit_temp_listing
limit 1 offset i;
-- Compare one by one (sadla)
SELECT
f.sku, f.offer_type, f.name, f.product_owner, f.listing_city, f.listing_area, f.price, f.bedrooms, f.building_size, f.land_size,
levenshtein_ratio(concat_ws(',',f.offer_type,f.name,f.product_owner,f.listing_city,f.listing_area,f.price,ifnull(f.bedrooms,0),ifnull(f.building_size,0),ifnull(f.land_size,0)),i_concatenated_attributes) as f_duplicate_percentage,
concat_ws(',',f.offer_type,f.name,f.product_owner,f.listing_city,f.listing_area,f.price,ifnull(f.bedrooms,0),ifnull(f.building_size,0),ifnull(f.land_size,0)) as fconcatenated_attributes
INTO f_sku, f_offer_type, f_name, f_product_owner, f_listing_city, f_listing_area, f_price, f_bedrooms, f_building_size, f_land_size, f_duplicate_percentage, f_concatenated_attributes
FROM unit_temp_listing f
WHERE substring(soundex(concat_ws(',',offer_type,name,product_owner,listing_city,listing_area,price,ifnull(bedrooms,0),ifnull(building_size,0),ifnull(land_size,0))),1,10) = substring(soundex(i_concatenated_attributes),1,10)
AND levenshtein_ratio(concat_ws(',',offer_type,name,product_owner,listing_city,listing_area,price,ifnull(bedrooms,0),ifnull(building_size,0),ifnull(land_size,0)),i_concatenated_attributes) > 90
AND f.sku != i_sku;
-- INSERT duplicates
IF(f_sku IS NOT NULL) THEN
INSERT INTO temp_temp (dup_sku, dup_percentage, attribs) VALUES (f_sku, f_duplicate_percentage, f_concatenated_attributes);
SET f_sku = null;
SET f_duplicate_percentage = null;
SET f_concatenated_attributes = null;
END IF;
SET i = i + 1;
END WHILE;
SELECT * FROM temp_temp;
DROP TABLE temp_temp;
End
What is the problem?
Hello to my fellow Developers out there I already solved my issue and I want to share it here.
The issue was, my SECOND SELECT
statement inside the WHILE
loop was returning multiple rows. I tried using CURSOR
but I still got the error message. So I tried to put that second SELECT
statement of mine inside the INSERT
statement like this
INSERT INTO table_name (columns, ...) SELECT_STATEMENT
and then, problem was solved! But if anyone here have an idea to optimize my query please do help me. I have to process 20k+ records but because of time of execution took too long, I only settle for 500 for 15-20 mins.
Thank you for the 18 views (at the time of writing).
Happy coding!