Search code examples
mysqlsearchrelevance

How to sort keyword search results by hit frequency?


I've built a search index (two column table) which assigns single keywords to their content ID.

Now searching for single keywords works and is damn fast.

However, searching for multiple keywords like SELECT media_id, keyword FROM search_index WHERE keyword = 'b' OR keyword = 'a' will return results in alphabetical keyword order (all hits on 'a' first, then 'b').

I tried doing it with PHP on the results page, but that requires me to load a lot more data from the database than I want to display, to get an acceptable sorting. With that technique the script might even time out on a keywords that been assigned more frequently.

I also tried GROUP BY which seemed to group the result from the alphabetical order together to their single IDs.

Edit: Lastly I found something like SELECT DISTINCT media_id, keyword, COUNT(media_id) AS num FROM search_index GROUP BY media_id ORDER BY num DESC, which works kinda well but is really slow.

So if i search for 'b' and 'a', I want the IDs with 'a' and 'b' first in the result.

So how can I tell MySQL to sort the output by the frequency of hits on one ID while still being damn fast?


Solution

  • Okay I figured it out myself.

    The fastest seems to be to do SELECT media_id, keyword, COUNT(media_id) AS num WHERE ... GROUP BY media_id and then going over the num field with a simple php-usort function.

    function SortFrequency($a, $b)
    {
         if ($a['num'] == $b['num']) return 0;
         return ($a['num'] > $b['num']) ? -1 : 1;
    }
    
    usort($results, 'SortFrequency');