I searched for a search engine, but the price is expensive or it seems to take a lot of time to build, so I want to save it through 'singer name music name' and use it as a fullText-index.
However, even if I keep searching, it is constantly compared with like, and only the advantages of the function are introduced, and the disadvantages are hard to see even if I look for them.
Currently, there is no big problem except that the index grows when about 30,000 data is created, so I am going to use it, but I wonder what the downsides of fulltext-index are!
If you search additionally, they say that it is better to install a morpheme analyzer, but I plan to use an ngram parser with ngram=1.
Addition) If too large an index can be a problem, I wonder how to increase the ngram size and search like like for 2 letters or less.
(When it is well suited, FULLTEXT is very fast; meanwhile...)
Downsides of FULLTEXT
. Note that many can be worked around, but it takes deliberate extra code.
AGAINST ("+john +q +smith" IN BOOLEAN MODE)
will always fail. This, on the other hand, will succeed, though it may deliver too many rows: AGAINST ("+john q +smith" IN BOOLEAN MODE)
. That is, "John W Smith" will succeed. See "fine tuning", below.Sanitizing the input. Part numbers and model numbers are a nuisance to deal with because of the liberties that manufacturers take. Example: "Model 3" -- "Model" is a nice fulltext 'word', but '3' is not. You could [perhaps] tweak the input (when INSERTing) to say "Model3". Another example "xyz-123". Another: "John Q. Smith"
Combining columns. It may be beneficial to put together likely FT strings (from many columns) and throw them into a single column that is purely for searching. Do the sanitizing as you insert the data.
Fine-tuning. Here's one way to solve some of the hiccups:
WHERE MATCH(texts) AGAINST ("+john q +smith" IN BOOLEAN MODE)
AND texts LIKE "%John Q. Smith%"
Explanation... The FT test will come first and be fast. Then comes the LIKE
(or REGEXP
) which is much slower, but is not a problem because it is being applied only to those rows that passed the FT test. Example: "John Jones and Phil Smith" will pass the MATCH
, but then be filtered out by the LIKE
.
Some more notes: http://mysql.rjweb.org/doc.php/myisam2innodb#fulltext
Your question starts out implying "words", then digresses into cases where "ngram" may be better. If it is "names" (singers, music titles, etc), then ngram is probably not worth considering.