Suppose I have a table products
with
product | description |
---|---|
apple | yummy green fruit |
banana | nutritional yellow fruit |
Notice that the description
column may contain texts of arbitrary length (LONGTEXT). Now I want to get the table words
with all of the words from the column description
:
words |
---|
yummy |
green |
fruit |
nutritional |
yellow |
All of the answers I have found on StackExchange refer to splitting a string into several columns, not rows. How do I approach this problem?
EDIT: SELECT VERSION();
returns:
VERSION() |
---|
8.0.27 |
Based on Edit :
The description column may contain texts of arbitrary length (LONGTEXT)
The following query shall work for LONGTEXT.
INSERT INTO words
SELECT DISTINCT REGEXP_REPLACE(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(description, ' ', numbers.n), ' ', -1)), '[^[:alnum:]]', '') AS word
FROM
(SELECT @row := @row + 1 AS n FROM (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) t1,
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) t2,
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) t3,
(SELECT @row:=0) t4) numbers
INNER JOIN products
ON CHAR_LENGTH(description) - CHAR_LENGTH(REPLACE(description, ' ', '')) >= numbers.n - 1;
Here is the DBFIDDLE Demo.
The above code shall also remove any comma, quotes, hyphen, semicolon and other special characters before inserting individual words to a table.
If you are fine with duplicate entries (words) in a table; then you remove the distinct clause at start of query.