Search code examples
mysqlstringvarchar

MySQL Query for selecting most common substrings separated by whitespace in a column?


I have a MySQL database of recipes. One of the columns, NAME, is a varchar(255). Each NAME can be multiple words (e.g., "Chocolate Chip Cookies" and "Oatmeal Cookies").

I'm looking for a way to efficiently parse the recipe names into individual words, then return a list of the most common words. So, given the two recipes above, the query would return:

cookies, 2 chocolate, 1 chip, 1 oatmeal, 1

Is there an efficient way to do this? I could query all the results, split each recipe name into words, then build a data structure to do this outside the database. Alternatively, I could split the strings up ahead of time and build a table.

I guess what I am really hoping for is to leverage the database to do this faster (like with an index). I don't know if that's possible with a single query.

So, am I stuck building a table ahead of time? Or can this be done efficiently with a single query?

Thanks!


Solution

  • If your table uses the MyISAM storage engine and you build a FULLTEXT index over your NAME column, you could use myisam_ftdump's --count option to obtain an overall count of each word. Sadly, however, this would be outside of SQL.