I have this table where long strings are saved by splitting in the VALUE
column and grouped by GROUPING_ID
:
CREATE TABLE input_table
(
CHUNK_ID BIGINT NOT NULL
PRIMARY KEY,
VALUE VARCHAR(5000) NULL,
GROUPING_ID BIGINT NULL
);
I want to resize the VALUE
column from VARCHAR(5000)
to VARCHAR(4000)
.
Values are grouped by GROUPING_ID
, so a string with 24123 chars initially will be split into 5 rows, 4 of 5000 chars and 1 of 4123 chars.
To reduce the column size, I need to concatenate the grouped strings and then re-split by 4000, so a string with 24123 chars will be saved on 7 rows, 6 of 4000 chars and 1 of 123 chars.
I want to do this in MySQL without using functions.
After some research I managed to concatenate the values and then with the help of ChatGPT I managed to split the concatenated string into smaller parts.
This solution will keep the right sorting of the chunks, and will allow configuring smaller or bigger chunk size, so you can use it to reduce or to increase column size.
To re-split the VALUE
column by 4000 this can be used:
SET @chunk_size = 4000;
SET group_concat_max_len = @chunk_size * 50 * 2; -- chunk_size x 50 unions below x 2 bytes
START TRANSACTION;
DELETE
FROM input_table
WHERE GROUPING_ID IS NULL;
CREATE TEMPORARY TABLE temp_all_concat_input
SELECT GROUP_CONCAT(VALUE SEPARATOR '') AS CONCAT_VALUE, GROUPING_ID AS TEMP_GROUPING_ID
FROM input_table
GROUP BY GROUPING_ID;
CREATE TEMPORARY TABLE temp_re_split_input AS
SELECT SUBSTRING(CONCAT_VALUE, (@chunk_size * (seq_numbers.SEQ - 1)) + 1, @chunk_size) AS CHUNK,
concat_input.TEMP_GROUPING_ID,
seq_numbers.SEQ
FROM (SELECT 1 AS SEQ UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9 UNION ALL SELECT 10 UNION ALL SELECT 11 UNION ALL SELECT 12 UNION ALL
SELECT 13 UNION ALL SELECT 14 UNION ALL SELECT 15 UNION ALL SELECT 16 UNION ALL
SELECT 17 UNION ALL SELECT 18 UNION ALL SELECT 19 UNION ALL SELECT 20 UNION ALL
SELECT 21 UNION ALL SELECT 22 UNION ALL SELECT 23 UNION ALL SELECT 24 UNION ALL
SELECT 25 UNION ALL SELECT 26 UNION ALL SELECT 27 UNION ALL SELECT 28 UNION ALL
SELECT 29 UNION ALL SELECT 30 UNION ALL SELECT 31 UNION ALL SELECT 32 UNION ALL
SELECT 33 UNION ALL SELECT 34 UNION ALL SELECT 35 UNION ALL SELECT 36 UNION ALL
SELECT 37 UNION ALL SELECT 38 UNION ALL SELECT 39 UNION ALL SELECT 40 UNION ALL
SELECT 41 UNION ALL SELECT 42 UNION ALL SELECT 43 UNION ALL SELECT 44 UNION ALL
SELECT 45 UNION ALL SELECT 46 UNION ALL SELECT 47 UNION ALL SELECT 48 UNION ALL
SELECT 49 UNION ALL SELECT 50) AS seq_numbers
CROSS JOIN (SELECT CONCAT_VALUE, TEMP_GROUPING_ID FROM temp_all_concat_input) AS concat_input
WHERE (@chunk_size * (seq_numbers.SEQ - 1)) + 1 <= CHAR_LENGTH(concat_input.CONCAT_VALUE)
ORDER BY concat_input.TEMP_GROUPING_ID, seq_numbers.SEQ;
TRUNCATE TABLE input_table;
INSERT INTO input_table(CHUNK_ID, VALUE, GROUPING_ID)
SELECT ROW_NUMBER() OVER (), CHUNK, TEMP_GROUPING_ID
FROM temp_re_split_input;
DROP TABLE temp_re_split_input;
DROP TABLE temp_all_concat_input;
-- note: table sequence generator should not be used, this is just an example that you need to update sequence if not using auto increment
UPDATE table_sequence_generator
SET ID_VAL = (SELECT COALESCE(MAX(CHUNK_ID) + 1, 1) FROM input_table)
WHERE ID_NAME = 'InputTableSequence';
COMMIT;
SET group_concat_max_len = 1024;
To increase the chunk size to 6000 you can just change the @chunk_size
variable: SET @chunk_size = 6000;
Notes:
@chunk_size
is configurableinput_table
all the values where GROUPING_ID
is nullCHUNK_ID
valuesCHAR_LENGTH
is used instead of LENGTH
function to support 2 bytes special characters tooVARCHAR
was used to store long strings, this was an external requirement, but type for VALUE
column can be replaced by TEXT
, MEDIUMTEXT
or LONGTEXT
and you will not face this problem with chunks, neither the input_table
will not be necessary, column can be used directly in the original table