I'm trying to use Full text search on postgresql:
select *
from entertainement
where to_tsvector('simple', name) @@ to_tsquery('simple', 'word_to_search:*')
This query works well and give me what I want to display.However I found on some websites when I enter a word that is not found it shows me No result found for 'word_to_search'
and give me some other propositions to some words similar to it.
For example if I put the word activityng
I got
No result found for activityng
But it gives me some propositions containing word activity
. However when I put the word activityns
I got :
No result found for activityns
But I got some propositions containing the word activities
. I didn't understand the logic of it because I thnik activityns is similiar to activity
than to activities
.
I tried to know the similiarity of this word using similarity
of pg_trgm
and I got:
select similarity('activity','activityns');
similarity: 0,6666667
select similarity('activities','activityns');
similarity: 0,4666667
Is there any other solutions to detect similarity between the words and gives more precise results?
FTS first reduce the token to lexeme and then compare, trigram compares three letters - you can't compare comparison results with so different algorithms, hereis example for FTS (showing why one is closerto another in your sample):
t=# with w(v) as (values('activityns'),('activity'),('activities'),('activit'))
select to_tsvector(v),v, to_tsvector(v) @@ to_tsquery('activ:*'),to_tsvector(v) @@ to_tsquery('activity'),to_tsvector(v) @@ to_tsquery('activit:*') from w;
to_tsvector | v | ?column? | ?column? | ?column?
---------------+------------+----------+----------+----------
'activityn':1 | activityns | t | f | t
'activ':1 | activity | t | t | f
'activ':1 | activities | t | t | f
'activit':1 | activit | t | f | t
(4 rows)
look to which lexeme the word is reduced (first column) and take a look at what results give you wildcard usage depending on lexeme to which the word is reduced (3,4,5) columns