Search code examples
mysqlperformancesearchentitysimilarity

Fulltext Search in MySQL


I am new to "searching" in MySQL and I have some tasks, which I don't know how to achieve them the best way.

I have the following MySQL-Table in my Database.

delimiter $$

CREATE TABLE `authors` (
  `id` int(11) NOT NULL,
  `name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
  `count` int(11) NOT NULL DEFAULT '1',
  PRIMARY KEY (`id`),
  UNIQUE KEY `name_UNIQUE` (`name`),
  FULLTEXT KEY `name_fulltext` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci$$

The task is: In a html form I want to enter a name, let's say "John Doe". This Name should be looked up in this table, in the column "name". I want the query to return all similar names like "John Due" or "John Doé" and so on. So the user can pick from a list the right name. Sometimes People want to look up a name like "John van Doe" (Netherland style). This should be displayed too in the list.

How is this best achieved? Or should I better ask. Is this possible? =) I am using a python cgi-script by the way, so any modules python can contribute are available.

Another question is: How do I look up Just "John" or "Dow"? Every name which has "John" in it shall be displayed. I tryed "WHERE name LIKE "John"" but this is way too slow. Is there a faster way?

Thank you for any suggestions.


Solution

  • The task is: In a html form I want to enter a name, let's say "John Doe". This Name should be looked up in this table, in the column "name". I want the query to return all similar names like "John Due" or "John Doé" and so on. So the user can pick from a list the right name. Sometimes People want to look up a name like "John van Doe" (Netherland style). This should be displayed too in the list.

    MySQL does not support synonym dictionaries, so you should provide one yourself.

    Yahoo API provides a spell correction service which you can use by submitting a query similar to this:

    SELECT  *
    FROM    search.spelling
    WHERE   query='juhn doe'
    

    using this URL:

    http://query.yahooapis.com/v1/public/yql?q=SELECT%20%20*%20%20FROM%20search.spelling%20WHERE%20query%20%3D%20'juhn%20doe'&format=json&diagnostics=true&callback=cbfunc
    

    As soon as you receive a list of synonyms, you may search for them using this query to MySQL:

    SELECT  *
    FROM    authors
    WHERE   MATCH(name) AGAINST ('(+juhn +doe) (+john +doe)' IN BOOLEAN MODE)
    

    John Doé will be returned by this since you are using UTF8_GENERAL_CI which is case and accent insensitive.

    If you want to look just for John, use this query:

    SELECT  *
    FROM    authors
    WHERE   MATCH(name) AGAINST ('+john' IN BOOLEAN MODE)
    

    Also, make sure you have parameter ft_min_word_len set to a reasonable value (1 is best) in my.cnf.

    Default is 4 which means that no three-letter surnames (like Doe) will be indexed.