Search code examples
phpmysqlsqlsortingkeyword-search

Sort sql result by occurence of a set of keywords


I'm trying to do a little search engine for my website. First of all, user type in some keywords and go to result page that use this code :

$result = array();
$keyword_tokens = explode(' ', $keywords);
$keyword_tokens = array_map(
    function($keywords) {
        return mysql_real_escape_string(trim($keywords));
    }, 
    $keyword_tokens
);
$sql = "SELECT * FROM search_table WHERE concat(title, description) LIKE '%";
$sql .= implode("%' OR concat(title, description) LIKE '%", $keyword_tokens) . "%' ORDER BY instr(description, '$keywords') DESC, instr(title, '$keywords') DESC";
$result=mysql_query($sql);

This code allows to search for keywords requested by user and sort by $keywords, so by the full exact string of group of keywords...

What I'm trying to do is to order results by the most occurence of each keyword. For example, if a row of my sql result contains 5 keywords and other one 3 etc.. the 5 keywords should come up first. I am searching to sort my results by the more keywords matching.

Hope everything is understandable...

Help will be really appreciated !


Solution

  • Lovely thing about MySQL, depending on your point of view. It treats data how it thinks you want it to be treated. 1 + '1'? pretty clear you want to treat the string like a number, so it happily performs the addition for you.

    Works for boolean operands too (just have to watch the operator precedence). (a like '%foo%') + (a like '%bar%') recognises a numeric context, treats the boolean result true as 1 and the boolean result false as 0 - essentially counting the keyword matches for you.

    How can you use that? Take your where clause, replace all the or with +, and make sure each individual like statement is wrapped in parenthesis, then order by it.

    eg:

    order by (concat(title, description) like '%keyword1%') 
      + (concat(title, description) like '%keyword2%') desc
    

    You can accomplish something similar using a fulltext index, but the weighting can be a little weird. Syntax for that would be:

    create fulltext index idx on search_table(title, description)
    

    ^^ just do that bit once.

    select *
      from search_table
      where match(title, description) against ('keyword1 keyword2 ...')
      order by match(title, description) against ('keyword1 keyword2 ...') desc
    

    This has the severe bonus of being far less annoying to construct a query for.

    Here's a proof of concept for you that demonstrates both methods. (albeit only against a single column - but it gets the point across)