Search code examples
mysqlfull-text-searchinnodbmariadbfull-text-indexing

mysql full text search right index


lets say I have table posts with this columns: top_totle,title,sub_title,text

I need to have full text search on all this column and order by relevance where top_title need to be more important then title etc.

so i have 2 question which are the same, what is the best way of making an index for this and how to format the query to best support this index?

index options: I can create combined full text index on all of this column or create separate index for each of this column

which is the prefered way? option 1:

SELECT
  title,
  MATCH (top_title) AGAINST ('text' IN BOOLEAN MODE) as toptitle_score,
  MATCH (title) AGAINST ('text' IN BOOLEAN MODE) as title_score,
  MATCH (sub_text) AGAINST ('text' IN BOOLEAN MODE) as sub_text_score,
FROM
  `posts`
WHERE
  MATCH (top_title,title , sub_text ) AGAINST ('text' IN BOOLEAN MODE)
  and `posts`.`deleted_at` IS NULL
  AND `published_at` IS NOT NULL
Order by toptitle_score desc, 
Order by title_score desc , 
Order by subtext_score desc

option 2:

SELECT
  title,
  MATCH (top_title) AGAINST ('text' IN BOOLEAN MODE) as toptitle_score,
  MATCH (title) AGAINST ('text' IN BOOLEAN MODE) as title_score,
  MATCH (sub_text) AGAINST ('text' IN BOOLEAN MODE) as sub_text_score,
FROM
  `posts`
WHERE
  (MATCH (top_title) AGAINST ('text' IN BOOLEAN MODE)
  or MATCH (title) AGAINST ('text' IN BOOLEAN MODE)
  or MATCH (sub_text) AGAINST ('text' IN BOOLEAN MODE))
  and `posts`.`deleted_at` IS NULL
  AND `published_at` IS NOT NULL
Order by toptitle_score desc, 
Order by title_score desc , 
Order by subtext_score desc

option 3:

is there some smarter way?

Solution

  • Option 1 is good. It needs 4 FT indexes (one per column, plus one with all 3 columns). Don't repeat ORDER BY:

    ORDER BY toptitle_score DESC , 
             title_score    DESC , 
             subtext_score  DESC
    

    Option 2 is not a viable contender. It needs only 3 indexes (not much savings) but is a lot slower due to OR.

    Option 3... (Option 1, as fixed, plus...)

    The ORDER BY you are using is probably "wrong" for what you want. For example, it will shove to the end of the list any rows without text in toptitle. Perhaps you want some "weighted" version:

    ORDER BY
       9 * top_title_score  +
       3 * title_score      +
       1 * sub_text_score  DESC
    

    (The 9,3,1 are rather arbitrary. It says that if 'text' shows up more than 3 times in title, that is more important than showing up once in top_title -- or something like that.)