I'm trying to show "related articles" block on my website. For this purpose I use the following query:
SELECT *, MATCH(title, content) AGAINST('search string') AS score
FROM news_items
WHERE MATCH(title, content) AGAINST('search string')
ORDER BY score DESC LIMIT 4
However, it often shows me very old articles while I want to show the latest ones so I fixed my query as following:
SELECT *, MATCH(title, content) AGAINST('search string') AS score
FROM news_items
WHERE MATCH(title, content) AGAINST('search string') > 4
ORDER BY ctime DESC, score DESC LIMIT 4
But in this case I can't get most relevant articles :(
Any ideas on how to find MOST RELEVANT AND FRESH article at the same time?
Thanks in advance!
You could change to an aggregate score... something like this:
SELECT *,
(
MATCH(title, content) AGAINST('search string')
-
(ABS(DATEDIFF(`timestampfield`, NOW())) / 365)
) AS score
FROM news_items
WHERE
MATCH(title, content) AGAINST('search string') > 4
ORDER BY score DESC LIMIT 4
In that there's one kinda funky addition, which you'd want to clean up:
- (ABS(DATEDIFF(`timestampfield`, NOW())) / 365)
This is your age
component of the score... currently scaled by <year> = 1 point
To get that, we start by getting the number of days between the timestamp field and now (absolute value):
ABS(DATEDIFF(`timestampfield`, NOW()))
Then we scale...
I decided you probably didn't want to loose score based on number of days, because if something was 30 days old it would be -30... seems too harsh. So I chose years... if you want to scale on number of weeks, divide by 52
instead of 365
... and so forth.
This scaling factor will be how you control value between scoring matching and age.
So it ends up being something like: <match score> - <yearsAgo>
If you do that:
NOTE this assumes your timestamp field is a full date-time field... if otherwise, you will need to re-cast to a date, or the logic to manipulate the unix timestamp directly.
And here's a debugging version of the query:
SELECT
`created`,
MATCH(title, content) AGAINST('awesome') as match_score,
(ABS(DATEDIFF(`created`, NOW())) / 365) as years_ago,
(
MATCH(title, content) AGAINST('awesome')
-
(ABS(DATEDIFF(`created`, NOW())) / 365)
) AS score
FROM news_items
WHERE
MATCH(title, content) AGAINST('awesome') > 4
ORDER BY score DESC LIMIT 4