Search code examples
mysqlsimilarity

Best way to pull similar items from a MySQL database


I have a database of items (articles for that matter).
What I'd like to do, is I'd like to pull X items that are similar to a specific item, based on two things - title, which is the title of the article, and tags, which are located in another table.

The structure is as follows (relevant fields only):

Table: article
Fields: articleid, title

Table: tag
Fields: tagid, tagtext

Table: articletag
Fields: tagid, articleid

What would be the best way to do this?


Solution

  • For myisam tables you can use Natural Language Full-Text search: http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html

    SELECT * FROM article a
    LEFT JOIN articletag at ON (at.articleid = a.articleid)
    LEFT JOIN tag t ON (at.tagid = t.tagid)
    WHERE MATCH (a.title) AGAINST ('some title' IN NATURAL LANGUAGE MODE)
    OR MATCH (t.tagtext) AGAINST ('some tag' IN NATURAL LANGUAGE MODE)
    GROUP BY a.articleid # if you don't want get duplicates 
    

    You can also think about adding redundant information about tags into one field (e.g. <taga><tagb><tagz>) in article table and update it each time tag is added/removed. This will simplify query and it should be faster:

    SELECT * FROM article a
    WHERE MATCH (a.title, a.tagtext) AGAINST ('some title or tag' IN NATURAL LANGUAGE MODE)