Suppose, that I have a following resultset:
SELECT
*
FROM
(
SELECT 1 as `no`, NULL as `sequence`
UNION ALL
SELECT 2, ''
UNION ALL
SELECT 3, '1'
UNION ALL
SELECT 4, '1,2,3,4,5'
UNION ALL
SELECT 5, '2,4,5'
UNION ALL
SELECT 6, '1, 5'
UNION ALL
SELECT 7, '1,3,5'
) as `sub`;
My task was to count sequence breaks / holes for each sequence
listed in below. I've written following stored function:
DELIMITER $$
DROP FUNCTION IF EXISTS `countSequenceBreaks`$$
CREATE FUNCTION `countSequenceBreaks`(`sequence` VARCHAR(1000))
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE `delimiter` CHAR DEFAULT ',';
DECLARE `current`, `last` INT;
DECLARE `result` INT DEFAULT 0;
IF
`sequence` IS NULL
OR
NOT LENGTH(`sequence`)
OR
NOT INSTR(`sequence`, `delimiter`)
THEN RETURN `result`;
END IF;
SET `current` = SUBSTRING_INDEX(`sequence`, `delimiter`, 1);
SET `last` = SUBSTRING_INDEX(`sequence`, `delimiter`, -1);
IF `last` < `current`
THEN
SET `result` = `last`;
SET `last` = `current`;
SET `current` = `result`;
SET `result` = 0;
END IF;
WHILE `current` < `last` DO
IF NOT FIND_IN_SET(`current`, `sequence`)
THEN SET `result` = `result` + 1;
END IF;
SET `current` = `current` + 1;
END WHILE;
RETURN `result`;
END$$
DELIMITER ;
But I'm worried about WHILE
-loop might take too much iterations for different sequence members and cause query slowdown.
Questions:
My debug query:
SELECT
`no`, `sequence`, `countSequenceBreaks`(`sequence`)
FROM
(
SELECT 1 as `no`, NULL as `sequence`
UNION ALL
SELECT 2, ''
UNION ALL
SELECT 3, '1'
UNION ALL
SELECT 4, '1,2,3,4,5'
UNION ALL
SELECT 5, '2,4,5'
UNION ALL
SELECT 6, '1, 5'
UNION ALL
SELECT 7, '1,3,5'
) as `sub`;
It's resultset:
no sequence `countSequenceBreaks`(`sequence`)
-----------------------------------------------
1 NULL 0
2 0
3 1 0
4 1,2,3,4,5 0
5 2,4,5 1
6 1,5 3
7 1,3,5 2
Regards.
You can do it with one simple query:
select sequence,
CASE WHEN NOT INSTR(IFNULL(sequence,''), ',') THEN 0
ELSE
(
SUBSTRING_INDEX(sequence,',' ,-1)
-SUBSTRING_INDEX(sequence,',' , 1)
)
-
(LENGTH(sequence)-LENGTH(REPLACE(sequence,',','')))
END countSequenceBreaks
from t
How to find count of sequence breaks?
For example for 1,3,5
sequence.
All we need to know breaks count is to calculate count of missed delimiters. In this case the full string 1,2,3,4,5
contains 5-1=4
delimiters but the 1,3,5
sequence contains only 2 delimiters so count of breaks (missed digits - what is equal to count of missed delimiters as you can see) = 4-2 = 2
How to know count of delimiters in the string?
In our case when delimiter has one symbol length it is (LENGTH(sequence)-LENGTH(REPLACE(sequence,',',''))