I have a Mysql Function that runs as part of a larger query reading a few million records. In order to detect anomalies, I'm figuring out the average change over time. The data in the table is stored as JSON objects with UNIX timestamps as the key for up to 30 days.
As an example, the input (input_array) would look something like:
[{"1532944806": 16}, {"1533031206": 14}, {"1533117605": 13}, {"1533204305": 12}, {"1533290708": 10}, {"1533463506": 9}, {"1533549907": 9}, {"1533636306": 9}, {"1533722707": 9}, {"1533809108": 9}, {"1533895506": 9}, {"1533981906": 8}, {"1534068306": 7}, {"1534154706": 7}, {"1534241108": 7}, {"1534590304": 7}, {"1534673106": 12}, {"1534759508": 6}, {"1534845905": 7}, {"1534932306": 7}, {"1535018707": 5}, {"1535105106": 3}, {"1535191505": 7}, {"1535277907": 6}, {"1535364305": 7}, {"1535450706": 2}, {"1535537107": 1}]
I'm only looking to average decreasing changes - not any change that increases over a day.
I'm checking that a value for the previous day exists, and if so, I'm calculating change and adding it into a temporary table that gets queried at to select the average.
So far I have:
CREATE FUNCTION `daily_averages`(input_array JSON) RETURNS int(4)
READS SQL DATA
DETERMINISTIC
BEGIN
DECLARE array_length INTEGER(2);
DECLARE prev_value INTEGER(4);
DECLARE idx INTEGER(4);
DROP TEMPORARY TABLE IF EXISTS collection;
CREATE TEMPORARY TABLE collection (change INTEGER(4) SIGNED DEFAULT 0);
SELECT JSON_LENGTH(input_array) INTO array_length;
SET idx = 0;
WHILE idx < array_length DO
SELECT
IF(idx-1 > -1,
CONVERT(
JSON_EXTRACT(
JSON_EXTRACT(
JSON_EXTRACT( input_array, CONCAT( '$[', idx-1, ']' ) )
, '$.*'
)
, '$[0]'
), SIGNED INTEGER
)
, -1
)
INTO prev_value;
INSERT INTO collection
SELECT (prev_value -
(
CONVERT(
JSON_EXTRACT(
JSON_EXTRACT(
JSON_EXTRACT( input_array, CONCAT( '$[', idx, ']' ) )
, '$.*'
)
, '$[0]'
), SIGNED INTEGER
)
)
)
FROM DUAL
WHERE prev_value > 0;
SET idx = idx + 1;
END WHILE;
RETURN (SELECT AVG(change) FROM collection WHERE change > -1);
END
With about 2.7 million records, it takes about 20 minutes to run currently. I'm looking to optimize this or re-write it by avoiding the DROP/CREATE overhead.
It seems unnecessary to create a table just to calculate an average, it's simple to do in the loop. Instead of inserting each value into a table, add it to a total variable. At the end, return total/count
.
Since you're totalling the differences between values,
You can also use SET
statements to assign variables, rather than SELECT ... INTO variable
.
DECLARE array_length INTEGER(2);
DECLARE prev_value INTEGER(4);
DECLARE idx INTEGER(4);
DECLARE total INTEGER(4);
DECLARE counter INTEGER(4);
DECLARE cur_value INTEGER(4);
SET array_length = JSON_LENGTH(input_array);
SET total = 0;
SET counter = 0;
-- Initialize prev_value to the first element
SET prev_value = CONVERT(
JSON_EXTRACT(
JSON_EXTRACT(
JSON_EXTRACT( input_array, '$[0]' )
, '$.*'
)
, '$[0]'
), SIGNED INTEGER
);
SET idx = 1;
WHILE idx < array_length DO
SET cur_value = CONVERT(
JSON_EXTRACT(
JSON_EXTRACT(
JSON_EXTRACT( input_array, CONCAT( '$[', idx, ']' ) )
, '$.*'
)
, '$[0]'
), SIGNED INTEGER
);
IF cur_value < prev_value
THEN
SET total = total + (prev_value - cur_value);
SET counter = counter + 1;
END IF;
SET prev_value = cur_value;
SET idx = idx + 1;
END WHILE;
RETURN total / counter;