Search code examples
mysqlsqlsortingfull-text-searchfull-text-indexing

MySQL fulltext search and sort by relevance + TIME


I'm trying to show "related articles" block on my website. For this purpose I use the following query:

SELECT *, MATCH(title, content) AGAINST('search string') AS score 
FROM news_items 
WHERE MATCH(title, content) AGAINST('search string') 
ORDER BY score DESC LIMIT 4

However, it often shows me very old articles while I want to show the latest ones so I fixed my query as following:

SELECT *, MATCH(title, content) AGAINST('search string') AS score 
FROM news_items 
WHERE MATCH(title, content) AGAINST('search string') > 4 
ORDER BY ctime DESC, score DESC LIMIT 4

But in this case I can't get most relevant articles :(

Any ideas on how to find MOST RELEVANT AND FRESH article at the same time?

Thanks in advance!


Solution

  • You could change to an aggregate score... something like this:


    SELECT *, 
      (
        MATCH(title, content) AGAINST('search string')
        - 
        (ABS(DATEDIFF(`timestampfield`, NOW())) / 365)
      ) AS score 
    FROM news_items 
    WHERE 
      MATCH(title, content) AGAINST('search string') > 4 
    ORDER BY score DESC LIMIT 4
    

    In that there's one kinda funky addition, which you'd want to clean up:

    - (ABS(DATEDIFF(`timestampfield`, NOW())) / 365)
    

    This is your age component of the score... currently scaled by <year> = 1 point

    To get that, we start by getting the number of days between the timestamp field and now (absolute value):

    ABS(DATEDIFF(`timestampfield`, NOW()))
    

    Then we scale...

    I decided you probably didn't want to loose score based on number of days, because if something was 30 days old it would be -30... seems too harsh. So I chose years... if you want to scale on number of weeks, divide by 52 instead of 365... and so forth.

    This scaling factor will be how you control value between scoring matching and age.

    So it ends up being something like: <match score> - <yearsAgo>


    If you do that:

    1. 5 (match score) - 0.1 (<1 year ago) = 4.9 (ok match, but newest)
    2. 5 (match score) - 0.01 (<1 year ago) = 4.99
    3. 5 (match score) - 1 (1 year ago) = 4
    4. 6 (match score) - 2 (2 years ago) = 4
    5. 9 (match score) - 5 (5 years ago) = 4 (best match, but old)
    6. 7 (match score) - 10 (10 years ago) = -3

    NOTE this assumes your timestamp field is a full date-time field... if otherwise, you will need to re-cast to a date, or the logic to manipulate the unix timestamp directly.

    And here's a debugging version of the query:

    SELECT
        `created`,
        MATCH(title, content) AGAINST('awesome') as match_score,
        (ABS(DATEDIFF(`created`, NOW())) / 365) as years_ago,
      (
        MATCH(title, content) AGAINST('awesome')
        - 
        (ABS(DATEDIFF(`created`, NOW())) / 365)
      ) AS score 
    FROM news_items 
    WHERE 
      MATCH(title, content) AGAINST('awesome') > 4 
    ORDER BY score DESC LIMIT 4