Search code examples
mysqltext

Splitting text into words in MySQL


Suppose I have a table products with

product description
apple yummy green fruit
banana nutritional yellow fruit

Notice that the description column may contain texts of arbitrary length (LONGTEXT). Now I want to get the table words with all of the words from the column description:

words
yummy
green
fruit
nutritional
yellow

All of the answers I have found on StackExchange refer to splitting a string into several columns, not rows. How do I approach this problem?

EDIT: SELECT VERSION(); returns:

VERSION()
8.0.27

Solution

  • Based on Edit :

    The description column may contain texts of arbitrary length (LONGTEXT)

    The following query shall work for LONGTEXT.

    INSERT INTO words
    SELECT DISTINCT REGEXP_REPLACE(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(description, ' ', numbers.n), ' ', -1)), '[^[:alnum:]]', '') AS word
    FROM
      (SELECT @row := @row + 1 AS n FROM (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) t1,
      (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) t2,
      (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) t3,
      (SELECT @row:=0) t4) numbers
      INNER JOIN products
      ON CHAR_LENGTH(description) - CHAR_LENGTH(REPLACE(description, ' ', '')) >= numbers.n - 1;
    

    Here is the DBFIDDLE Demo.

    The above code shall also remove any comma, quotes, hyphen, semicolon and other special characters before inserting individual words to a table.

    If you are fine with duplicate entries (words) in a table; then you remove the distinct clause at start of query.