Search code examples
mysqlfull-text-searchtildeboolean-search

`~` (tilde) operator in Boolean Full-Text Search in MySQL is not behaving as stated in MySQL developer website


I have created the following table fruits -

CREATE TABLE `fruits` (
  `id` tinyint unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(200) NOT NULL,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `ft_name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Then I entered the following values in table fruits -

SELECT * FROM fruits;
+----+---------------+
| id | name          |
+----+---------------+
|  1 | apple, orange |
|  2 | apple, mango  |
|  3 | mango, kiwi   |
|  4 | mango, guava  |
|  5 | apple, banana |
+----+---------------+

Now I run the following three SQL queries -

Query 1:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple' IN BOOLEAN MODE);
+----+---------------+
| id | name          |
+----+---------------+
|  1 | apple, orange |
|  2 | apple, mango  |
|  5 | apple, banana |
+----+---------------+

Query 2:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple -orange' IN BOOLEAN MODE);
+----+---------------+
| id | name          |
+----+---------------+
|  2 | apple, mango  |
|  5 | apple, banana |
+----+---------------+

Query 3:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple ~orange' IN BOOLEAN MODE);
+----+---------------+
| id | name          |
+----+---------------+
|  1 | apple, orange |
|  2 | apple, mango  |
|  5 | apple, banana |
+----+---------------+

As per MySQL developer website following is the function of ~ (tilde) operator in 'Boolean Full-Text Searches'

https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html

  • '+apple ~macintosh'

Find rows that contain the word “apple”, but if the row also contains the word “macintosh”, rate it lower than if row does not. This is “softer” than a search for '+apple -macintosh', for which the presence of “macintosh” causes the row not to be returned at all.

I have tried the ~ (tilde) operator in 'Query 3' but the output is certainly not what is expected. Here, the expected behavior is row with id = 1 coming at last.

P.S. I am using MySQL version - 8.0.26-0ubuntu0.20.04.2 for Linux on x86_64 ((Ubuntu))


Solution

  • Though there is no documentation regarding my answer anywhere, after thorough experimentation I have arrived at this most logical conclusion -

    Presence of '+' operator nullifies any effect of '~' operator

    I have updated my table fruits with the following values -

    SELECT * FROM fruits;
    +----+-------------------------+
    | id | name                    |
    +----+-------------------------+
    |  1 | apple orange watermelon |
    |  2 | apple mango pomegranate |
    |  3 | apple mango banana      |
    |  4 | mango kiwi pomegranate  |
    |  5 | mango guava watermelon  |
    |  6 | apple banana kiwi       |
    +----+-------------------------+
    

    Query 1:

    SELECT id, name FROM fruits
        -> WHERE MATCH(name) AGAINST
        -> ('apple mango ~pomegranate'
        -> IN BOOLEAN MODE);
    +----+-------------------------+
    | id | name                    |
    +----+-------------------------+
    |  3 | apple mango banana      |
    |  1 | apple orange watermelon |
    |  5 | mango guava watermelon  |
    |  6 | apple banana kiwi       |
    |  2 | apple mango pomegranate |
    |  4 | mango kiwi pomegranate  |
    +----+-------------------------+
    

    Query 2:

    SELECT id, name FROM fruits
        -> WHERE MATCH(name) AGAINST
        -> ('apple ~pomegranate'
        -> IN BOOLEAN MODE);
    +----+-------------------------+
    | id | name                    |
    +----+-------------------------+
    |  1 | apple orange watermelon |
    |  3 | apple mango banana      |
    |  6 | apple banana kiwi       |
    |  2 | apple mango pomegranate |
    +----+-------------------------+
    

    Query 3:

    SELECT id, name FROM fruits
        -> WHERE MATCH(name) AGAINST
        -> ('mango ~pomegranate'
        -> IN BOOLEAN MODE);
    +----+-------------------------+
    | id | name                    |
    +----+-------------------------+
    |  3 | apple mango banana      |
    |  5 | mango guava watermelon  |
    |  2 | apple mango pomegranate |
    |  4 | mango kiwi pomegranate  |
    +----+-------------------------+
    

    Here, in the queries 1, 2 & 3 no operator precedes the values apple and mango and ~ operator precedes the value pomegranate. This makes sure that the rows having the word pomegranate are ranked lower than others.

    Query 4:

    SELECT id, name FROM fruits
        -> WHERE MATCH(name) AGAINST
        -> ('+apple +mango ~pomegranate'
        -> IN BOOLEAN MODE);
    +----+-------------------------+
    | id | name                    |
    +----+-------------------------+
    |  2 | apple mango pomegranate |
    |  3 | apple mango banana      |
    +----+-------------------------+
    

    Query 5:

    SELECT id, name FROM fruits
        -> WHERE MATCH(name) AGAINST
        -> ('+apple ~pomegranate'
        -> IN BOOLEAN MODE);
    +----+-------------------------+
    | id | name                    |
    +----+-------------------------+
    |  1 | apple orange watermelon |
    |  2 | apple mango pomegranate |
    |  3 | apple mango banana      |
    |  6 | apple banana kiwi       |
    +----+-------------------------+
    

    Query 6:

    SELECT id, name FROM fruits
        -> WHERE MATCH(name) AGAINST
        -> ('+mango ~pomegranate'
        -> IN BOOLEAN MODE);
    +----+-------------------------+
    | id | name                    |
    +----+-------------------------+
    |  2 | apple mango pomegranate |
    |  3 | apple mango banana      |
    |  4 | mango kiwi pomegranate  |
    |  5 | mango guava watermelon  |
    +----+-------------------------+
    

    Here, in the queries 4, 5 & 6 + operator precedes the values apple and mango and ~ operator precedes the value pomegranate. Clearly the presence of + operator nullifies any effect of ~ operator.

    Query 7:

    SELECT id, name FROM fruits
        -> WHERE MATCH(name) AGAINST
        -> ('+apple +mango <pomegranate'
        -> IN BOOLEAN MODE);
    +----+-------------------------+
    | id | name                    |
    +----+-------------------------+
    |  3 | apple mango banana      |
    |  2 | apple mango pomegranate |
    +----+-------------------------+
    

    Query 8:

    SELECT id, name FROM fruits
        -> WHERE MATCH(name) AGAINST
        -> ('+apple <pomegranate'
        -> IN BOOLEAN MODE);
    +----+-------------------------+
    | id | name                    |
    +----+-------------------------+
    |  1 | apple orange watermelon |
    |  3 | apple mango banana      |
    |  6 | apple banana kiwi       |
    |  2 | apple mango pomegranate |
    +----+-------------------------+
    

    Query 9:

    SELECT id, name FROM fruits
        -> WHERE MATCH(name) AGAINST
        -> ('+mango <pomegranate'
        -> IN BOOLEAN MODE);
    +----+-------------------------+
    | id | name                    |
    +----+-------------------------+
    |  3 | apple mango banana      |
    |  5 | mango guava watermelon  |
    |  2 | apple mango pomegranate |
    |  4 | mango kiwi pomegranate  |
    +----+-------------------------+
    

    Here, in the queries 7, 8 & 9 + operator precedes the values apple and mango and < operator precedes the value pomegranate. This makes sure that the rows having the word pomegranate are ranked lower than others.

    Thus, what can be deduced from here is that - if + operator is present, use < operator instead of ~ operator


    UPDATE

    Upon extensive calculation I have created the table fruits_score_count which shows the score of each fruit when done Boolean FULLTEXT search.

    SELECT * FROM fruits_score_count;
    +----+-------------+---------------------+----------------------+
    | id | fruit_name  | row_numbers_matched | score                |
    +----+-------------+---------------------+----------------------+
    |  1 | apple       |                   4 | 0.031008131802082062 |
    |  2 | banana      |                   2 |  0.22764469683170319 |
    |  3 | guava       |                   1 |   0.6055193543434143 |
    |  4 | kiwi        |                   2 |  0.22764469683170319 |
    |  5 | mango       |                   4 | 0.031008131802082062 |
    |  6 | orange      |                   1 |   0.6055193543434143 |
    |  7 | pomegranate |                   2 |  0.22764469683170319 |
    |  8 | watermelon  |                   2 |  0.22764469683170319 |
    +----+-------------+---------------------+----------------------+
    

    Query 1:

    SELECT id, name, score FROM
        -> (SELECT id, name, MATCH(name) AGAINST
        -> ('apple mango ~pomegranate' IN BOOLEAN MODE)
        -> AS score FROM fruits ORDER BY score DESC)
        -> AS temp WHERE score != 0;
    +----+-------------------------+----------------------+
    | id | name                    | score                |
    +----+-------------------------+----------------------+
    |  3 | apple mango banana      | 0.062016263604164124 |
    |  1 | apple orange watermelon | 0.031008131802082062 |
    |  5 | mango guava watermelon  | 0.031008131802082062 |
    |  6 | apple banana kiwi       | 0.031008131802082062 |
    |  2 | apple mango pomegranate |  -0.7103390693664551 |
    |  4 | mango kiwi pomegranate  |  -0.7413471937179565 |
    +----+-------------------------+----------------------+
    

    Query 2:

    SELECT id, name, score FROM
        -> (SELECT id, name, MATCH(name) AGAINST
        -> ('apple ~pomegranate' IN BOOLEAN MODE)
        -> AS score FROM fruits ORDER BY score DESC)
        -> AS temp WHERE score != 0;
    +----+-------------------------+----------------------+
    | id | name                    | score                |
    +----+-------------------------+----------------------+
    |  1 | apple orange watermelon | 0.031008131802082062 |
    |  3 | apple mango banana      | 0.031008131802082062 |
    |  6 | apple banana kiwi       | 0.031008131802082062 |
    |  2 | apple mango pomegranate |  -0.7413471937179565 |
    +----+-------------------------+----------------------+
    

    Query 3:

    SELECT id, name, score FROM
        -> (SELECT id, name, MATCH(name) AGAINST
        -> ('mango ~pomegranate' IN BOOLEAN MODE)
        -> AS score FROM fruits ORDER BY score DESC)
        -> AS temp WHERE score != 0;
    +----+-------------------------+----------------------+
    | id | name                    | score                |
    +----+-------------------------+----------------------+
    |  3 | apple mango banana      | 0.031008131802082062 |
    |  5 | mango guava watermelon  | 0.031008131802082062 |
    |  2 | apple mango pomegranate |  -0.7413471937179565 |
    |  4 | mango kiwi pomegranate  |  -0.7413471937179565 |
    +----+-------------------------+----------------------+
    

    Query 4:

    SELECT id, name, score FROM
        -> (SELECT id, name, MATCH(name) AGAINST
        -> ('+apple +mango ~pomegranate' IN BOOLEAN MODE)
        -> AS score FROM fruits ORDER BY score DESC)
        -> AS temp WHERE score != 0;
    +----+-------------------------+----------------------+
    | id | name                    | score                |
    +----+-------------------------+----------------------+
    |  2 | apple mango pomegranate | 0.062016263604164124 |
    |  3 | apple mango banana      | 0.062016263604164124 |
    +----+-------------------------+----------------------+
    

    Query 5:

    SELECT id, name, score FROM
        -> (SELECT id, name, MATCH(name) AGAINST
        -> ('+apple ~pomegranate' IN BOOLEAN MODE)
        -> AS score FROM fruits ORDER BY score DESC)
        -> AS temp WHERE score != 0;
    +----+-------------------------+----------------------+
    | id | name                    | score                |
    +----+-------------------------+----------------------+
    |  1 | apple orange watermelon | 0.031008131802082062 |
    |  2 | apple mango pomegranate | 0.031008131802082062 |
    |  3 | apple mango banana      | 0.031008131802082062 |
    |  6 | apple banana kiwi       | 0.031008131802082062 |
    +----+-------------------------+----------------------+
    

    Query 6:

    SELECT id, name, score FROM
        -> (SELECT id, name, MATCH(name) AGAINST
        -> ('+mango ~pomegranate' IN BOOLEAN MODE)
        -> AS score FROM fruits ORDER BY score DESC)
        -> AS temp WHERE score != 0;
    +----+-------------------------+----------------------+
    | id | name                    | score                |
    +----+-------------------------+----------------------+
    |  2 | apple mango pomegranate | 0.031008131802082062 |
    |  3 | apple mango banana      | 0.031008131802082062 |
    |  4 | mango kiwi pomegranate  | 0.031008131802082062 |
    |  5 | mango guava watermelon  | 0.031008131802082062 |
    +----+-------------------------+----------------------+
    

    Query 7:

    SELECT id, name, score FROM
        -> (SELECT id, name, MATCH(name) AGAINST
        -> ('+apple +mango <pomegranate' IN BOOLEAN MODE)
        -> AS score FROM fruits ORDER BY score DESC)
        -> AS temp WHERE score != 0;
    +----+-------------------------+----------------------+
    | id | name                    | score                |
    +----+-------------------------+----------------------+
    |  3 | apple mango banana      | 0.062016263604164124 |
    |  2 | apple mango pomegranate |  -0.7103390693664551 |
    +----+-------------------------+----------------------+
    

    Query 8:

    SELECT id, name, score FROM
        -> (SELECT id, name, MATCH(name) AGAINST
        -> ('+apple <pomegranate' IN BOOLEAN MODE)
        -> AS score FROM fruits ORDER BY score DESC)
        -> AS temp WHERE score != 0;
    +----+-------------------------+----------------------+
    | id | name                    | score                |
    +----+-------------------------+----------------------+
    |  1 | apple orange watermelon | 0.031008131802082062 |
    |  3 | apple mango banana      | 0.031008131802082062 |
    |  6 | apple banana kiwi       | 0.031008131802082062 |
    |  2 | apple mango pomegranate |  -0.7413471937179565 |
    +----+-------------------------+----------------------+
    

    Query 9:

    SELECT id, name, score FROM
        -> (SELECT id, name, MATCH(name) AGAINST
        -> ('+mango <pomegranate' IN BOOLEAN MODE)
        -> AS score FROM fruits ORDER BY score DESC)
        -> AS temp WHERE score != 0;
    +----+-------------------------+----------------------+
    | id | name                    | score                |
    +----+-------------------------+----------------------+
    |  3 | apple mango banana      | 0.031008131802082062 |
    |  5 | mango guava watermelon  | 0.031008131802082062 |
    |  2 | apple mango pomegranate |  -0.7413471937179565 |
    |  4 | mango kiwi pomegranate  |  -0.7413471937179565 |
    +----+-------------------------+----------------------+
    

    Here, Query 1, Query 2, Query 3, Query 7, Query 8, Query 9 behaves as expected.

    But from Query 4, Query 5, Query 6 it is clear that -

    In the presence of + operator preceding a value with the ~ operator basically makes the value invisible.

    Also careful observation reveals that -

    x ~y and +x <y are equivalent


    FURTHER EXPERIMENTATION

    Query 1:

    SELECT id, name, score FROM
        -> (SELECT id, name, MATCH(name) AGAINST
        -> ('+mango apple ~pomegranate' IN BOOLEAN MODE)
        -> AS score FROM fruits ORDER BY score DESC)
        -> AS temp WHERE score != 0;
    +----+-------------------------+----------------------+
    | id | name                    | score                |
    +----+-------------------------+----------------------+
    |  3 | apple mango banana      | 0.062016263604164124 |
    |  4 | mango kiwi pomegranate  | 0.031008131802082062 |
    |  5 | mango guava watermelon  | 0.031008131802082062 |
    |  2 | apple mango pomegranate |  -0.7103390693664551 |
    +----+-------------------------+----------------------+
    
    • Row 1 with id = 3 gets maximum score which is the sum of scores of mango and apple.
    • Row 2 with id = 4 gets second maximum score which is the score of mango. Presence of + operator in front of mango makes ~pomegranate of search phrase irrelevant.
    • Row 3 with id = 5 gets the same score as that of Row 2. But it is placed lower than Row 2 because when scores are same, rows are ranked in increasing order of primary key, here id is primary key.
    • Row 4 with id = 2 gets lowest score and hence comes last. Here since the word apple is present and in the search phrase there is no + operator preceding apple, hence ~pomegranate in search phrase is taken into consideration, which lowers the score significantly.

    Query 2:

    SELECT id, name, score FROM
        -> (SELECT id, name, MATCH(name) AGAINST
        -> ('+mango apple <pomegranate' IN BOOLEAN MODE)
        -> AS score FROM fruits ORDER BY score DESC)
        -> AS temp WHERE score != 0;
    +----+-------------------------+----------------------+
    | id | name                    | score                |
    +----+-------------------------+----------------------+
    |  3 | apple mango banana      | 0.062016263604164124 |
    |  5 | mango guava watermelon  | 0.031008131802082062 |
    |  2 | apple mango pomegranate |  -0.7103390693664551 |
    |  4 | mango kiwi pomegranate  |  -0.7413471937179565 |
    +----+-------------------------+----------------------+
    

    This again illustrates that < operator takes effect even in the presence of + operator.

    This further reinforces my earlier observation that -

    if + operator is present, use < operator instead of ~ operator