There is a un normalized table in my database with name details
structure and sample data as below(apologies for image, just thought it would be more comprehensible):
My challenge is to split columns - assignee, inventor and ipcsubclass
using delimitor |
into new tables {detail_inv
and inventors
}, {detail_asg
and assignees
} and {detail_ipc
and ipcsubclasses
}.
In all three cases, table schemas are similar. For example, columns on inventors table- id
and name
and on detail_inv table- detail_id
and inventor_id
. There must be only one name per row with all names unique in inventors table and ids to hold relationship in detail_inv table.
I tried stored procedure with below code for inventors- I made 3 procedures for 3 columns:(
drop procedure if exists normalise_details;
delimiter #
create procedure normalise_details()
proc_main:begin
declare v_cursor_done int unsigned default 0;
declare v_post_id int unsigned;
declare v_tags varchar(2048);
declare v_keyword varchar(50);
declare v_keyword_id mediumint unsigned;
declare v_tags_done int unsigned;
declare v_tags_idx int unsigned;
declare v_cursor cursor for select id, inventor from details order by id;
declare continue handler for not found set v_cursor_done = 1;
set autocommit = 0;
open v_cursor;
repeat
fetch v_cursor into v_post_id, v_tags;
set v_tags_done = 0;
set v_tags_idx = 1;
while not v_tags_done do
set v_keyword = substring(v_tags, v_tags_idx,
if(locate('|', v_tags, v_tags_idx) > 0,
locate('|', v_tags, v_tags_idx) - v_tags_idx,
length(v_tags)));
if length(v_keyword) > 0 then
set v_tags_idx = v_tags_idx + length(v_keyword) + 1;
set v_keyword = trim(v_keyword);
insert into inventors (name) values (v_keyword);
select id into v_keyword_id from inventors where name = v_keyword;
insert into details_inv (inventor_id, detail_id) values (v_keyword_id, v_post_id);
else
set v_tags_done = 1;
end if;
end while;
until v_cursor_done end repeat;
close v_cursor;
commit;
end proc_main #
delimiter ;
When I try this on some random test data, it works fine. when i do this on actual table, it doesn't work well. Only partial data is inserted. SQL throws no errors (except for some times: "#1172 - Result consisted of more than one row" or "inventor_id column can't be null")
I tried modifying the code at MySQL - Insert comma separated list into normalized tables via stored procedure to suit my needs but i failed.
Please help me, my DB table has become a mess and there are about 500,000 rows making it really difficult for me to explode and manage huge arrays on each project(recent project with ~200,000 rows).
Looking at RolandoMySQLDBA's post to this dba.stackexchange question I feel confirmed in my initial reservations regarding triggered stored procedures. However, if you are certain that at any given time only a few lines are changed by user input then it should be possible to put together a fast acting procedure.
However, if there are many users working in parallel they might still lock each other out. I don't know whether this will really happen since the stored procedure will not change anything in the details
table. If necessary you could check out this page for ideas.
Edit: TRIGGER
I just extended the SQLfiddle of my previous post to this SQLfiddle with trigger, containing the following:
CREATE TRIGGER normdet AFTER INSERT ON detail FOR EACH ROW
BEGIN
DECLARE n int; DECLARE word VARCHAR(64)
;SET n=cntparts(NEW.inventor)
;WHILE n>0 DO
SET word=part(new.inventor,n)
;IF NOT EXISTS (SELECT * FROM inv WHERE invname=word) THEN
INSERT INTO inv (invname) VALUES (word)
;END IF
;INSERT INTO det2inv (didid,diiid)
SELECT NEW.id,invid FROM inv WHERE invname=word
;SET n=n-1
;END WHILE
-- and similar loops for assignee and cls ...
;END;
I also defined another function
CREATE FUNCTION cntparts (var varchar(1024)) RETURNS int
RETURN 1+LENGTH(var)-LENGTH(REPLACE(var,'|',''));
counting the words in a given varchar
. This can also be used to create loops instead of my fixed UNION
constructs for the base conversion in my first post.
The trigger now takes care of all new INSERT
s. A similar trigger still needs to be written to do the same for UPDATE
s. That should not be too hard to do ...
In my SQLfiddle I inserted another row into detail
after the trigger definition. The results are listed by two comparative SELECT statements, see fiddle.
reply to last comment:
Well, as I suggested in my original answer, you should first import all of the data (without having any triggers installed!!!!) and then plough through the detail
-table with the SELECT/UNION
statements. Before you do that you should find out the maximum number of words in each of the columns assignee
,inventor
and ipsubclass
by using
SELECT MAX(cntparts(inventor)) invcnt,
MAX(cntparts(assignee)) asscnt,
MAX(cntparts(ipsubclass)) clscnt
FROM detail
You can then adjust the number of SELECT/UNION
statments needed for each column. Then fill the link tables as shown in the SQLfiddle.
Maybe the whole process takes a while but you can safely work on one table after another (first the actual attribute table and then the associated link-table).
After that you can activate your trigger which should then only work on individually added lines.