Search code examples
mysqlfull-text-indexing

What does the term "Stopword" mean in MySQL?


I'm currently studying about MySQL command and got stuck at using the "MATCH...AGAINST" command on FULLTEXT index. It returns an "empty set" when it's against a "stopword"(which is "and" in my case).

Here's what I did. The database I'm working on contains a list of books and their author. I'm trying to select the entries that contain "and" in their title. Here's a list in my 'classics' table.

+--------------------+------------------------------+
| author             | title                        |
+--------------------+------------------------------+
| Mark Twain         | The Adventures of Tom Sawyer |
| Jane Austen        | Pride and Prejudice          |
| Charles Darwin     | The Origin of Species        |
| Charles Dickens    | The Old Curiosity Shop       |
| William Shakespear | Romeo and Juliet             |
+--------------------+------------------------------+

This is the code I've written

SELECT author, title FROM classics
WHERE MATCH(author, title) AGAINST('and');

Empty set (0.00 sec)

The result in my expectation was "Pride and Prejudice" and "Romeo and Juliet" instead of "Empty set (0.00 sec)". I now realized that "and" is a stopword.

My question is What does the "stopword" mean and how do I know which word is a stopword? And what should I do if I really want to select the query which contains "and" in its title?


Solution

  • My question is What does the "stopword" mean ...

    A stopword is a word that will be ignored when given as a keyword in a full-text search.

    For more information read the Wikipedia page on stopwords.

    MySQL uses the term in a way that is consistent with the normal definition.

    ... and how do I know which word is a stopword?

    For InnoDB tables you can query the INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD table.

    For MyISAM search indexes, the stopwords are loaded from a file. It may be possible to read the file at runtime using Java file I/O, but it apparently can't be accessed via a database query.

    And what should I do if I really want to select the query which contains "and" in its title?

    The MySQL documentation explains how to do it; see Section 12.9.4 Full-Text Stopwords. (There is too much detail to copy it here.)

    My reading is that you need to make configuration changes and restart the database server to change the stopwords. For InnoDB tables you also need to regenerate the table's full-text index.

    That means that you cannot change the stopwords for each query ... if that is what you are aiming to do. But you could explicitly query for a stopword using LIKE; e.g.

    SELECT author, title FROM classics
    WHERE title LIKE '% and %';
    

    That query would probably entail a table scan, so you want to avoid it if possible.