Search code examples
mysqlsqldenormalization

How to find most popular word occurrences in MySQL?


I have a table called results with 5 columns.

I'd like to use the title column to find rows that are say: WHERE title like '%for sale%' and then listing the most popular words in that column. One would be for and another would be sale but I want to see what other words correlate with this.

Sample data:

title
cheap cars for sale
house for sale
cats and dogs for sale
iphones and androids for sale
cheap phones for sale
house furniture for sale

Results (single words):

for    6
sale    6
cheap    2
and    2
house    2
furniture 1
cars    1
etc...

Solution

  • You can extract words with some string manipulation. Assuming you have a numbers table and that words are separated by single spaces:

    select substring_index(substring_index(r.title, ' ', n.n), ' ', -1) as word,
           count(*)
    from results r join
         numbers n
         on n.n <= length(title) - length(replace(title, ' ', '')) + 1
    group by word;
    

    If you don't have a numbers table, you can construct one manually using a subquery:

    from results r join
         (select 1 as n union all select 2 union all select 3 union all . . .
         ) n
         . . .
    

    The SQL Fiddle (courtesy of @GrzegorzAdamKowalski) is here.