I have a local env with Mysql 5.7.19 (on windows 10 Pro French) and a prod server with Mysql 5.7.31 (Ubuntu Linux 16.04.5).
The data is synchronised from Prod to local ENV. I have a fullText index on 3 columns and a simple request :
SELECT MATCH (r0_.title, r0_.description, r0_.tag_text)
AGAINST ('+poulet* +carotte*' IN BOOLEAN MODE) AS sclr_0,
r0_.id AS id_1, r0_.title AS title_2, r0_.description AS description_3,
r0_.url AS url_4, r0_.image AS image_5, r0_.slug AS slug_6, r0_.click AS click_7, r0_.tag_text AS tag_text_8, r0_.active AS active_9, r0_.created_at AS created_at_10, r0_.updated_at AS updated_at_11
FROM recipe r0_
WHERE r0_.active = 1
HAVING sclr_0 >= 1
ORDER BY sclr_0 DESC;
On local env => 98 results
On prod env => 0 result
Create schema :
CREATE TABLE `recipe` (
`id` int(11) NOT NULL,
`blog_id` int(11) NOT NULL,
`title` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`description` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
`url` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`image` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`slug` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`click` smallint(6) NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`tag_text` varchar(1000) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`active` tinyint(1) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
ALTER TABLE `recipe`
ADD PRIMARY KEY (`id`),
ADD KEY `IDX_DA88B137DAE07E97` (`blog_id`),
ADD KEY `IDX_DA88B1374B1EFC02` (`active`),
ADD KEY `IDX_DA88B1378B8E8428` (`created_at`);
ALTER TABLE `recipe` ADD FULLTEXT KEY `IDX_DA88B1372B36786B6DE44026D5841871`
(`title`,`description`,`tag_text`);
More data on Prod actually because new recipe but no result.
The relevancy score is calculated based on the content of the complete table:
InnoDB uses a variation of the “term frequency-inverse document frequency” (TF-IDF) weighting system to rank a document's relevance for a given full-text search query. The TF-IDF weighting is based on how frequently a word appears in a document, offset by how frequently the word appears in all documents in the collection. In other words, the more frequently a word appears in a document, and the less frequently the word appears in the document collection, the higher the document is ranked.
"Document" here means a single row, "document collection" means all rows. The manual contains the exact formula, but the important thing is: since you have more recipes on prod compared to local, the score will be different. If you e.g. added more recipes containing carottes, the score will go down, if you added recipes that don't contain your search terms, the score will go up.
This is completely independent of how good the single result on its own actually is! A chicken carrot stew is a good fit to your search, but the absolute score will vary if you also have a recipe for carrot cake in your database or not.
So the absolute value of the score itself is usually not a good criteria to filter on, e.g. with your where score > 1
, but as a way to order the results you get, e.g. with order by score desc
, usually including a limit
.
It is unlikely that you will find a good absolute minimum value for your score (except for 0) that would make sense in general:
But a recipe that uses the word sugar
very often (as it may be an important ingredient, maybe a recipe for caramel) will have a higher score than those that only mentions it once ("add some sugar"), so you can use the value to order your results relative to each other.