Search code examples
postgresqlfull-text-searchtsvector

Postgresql tsvector not searching few strings


I am using PostgreSQL 11, created tsvector with gin index on column search_fields.

Data in table test

id   |           name           |         search_fields          
-------+--------------------------+--------------------------------

19973 | Ongoing 10x consultation | '10x' 'Ongoing' 'consultation'

19974 |  5x marketing | '5x' 'marketing'

19975 | Ongoing 15x consultation | '15x' 'Ongoing' 'consultation'

The default text search config is set as 'pg_catalog.english'.

Below both queries output 0 rows.

select id, name, search_fields from test where search_fields @@ to_tsquery('ongoing');

id | name | search_fields 
----+------+---------------
(0 rows)


select id, name, search_fields from test where search_fields @@ to_tsquery('simple','ongoing');

id | name | search_fields 
----+------+---------------
(0 rows)

But when I pass string as '10x' or 'consultation' it returns the correct output.

Any idea, why it is not searching for 'ongoing' words?

Afterwards, I have created the triggers using function tsvector_update_trigger() and update the search_fields and also set default_text_search_config to 'pg_catalog.simple' in postgresql.conf file, then I updated the search_fields with search_fields and it output as

select id, name, search_fields from test where search_fields @@ to_tsquery('ongoing');
 id |              name               |              search_fields              
----+---------------------------------+-----------------------------------------
  19973  | Ongoing 10x consultation | '10x':2 'consultation':3 'ongoing':1

This time when I ran query passing 'ongoing' string it output as per the expected result.

select id, name, search_fields from test where search_fields @@ to_tsquery('ongoing');

id   |           name           |         search_fields          
-------+--------------------------+--------------------------------

19973 | Ongoing 10x consultation | '10x':2 'consultation':3 'ongoing':1

19975 | Ongoing 15x consultation | '15x':2 'consultation':3 'ongoing':1

As per above experiment, setting trigger and default_text_search_config to 'pg_catalog.simple' help to achieve the result.

Now, I don't know what is the reason why it didn't work with default_text_search_config to 'pg_catalog.english'.

Is trigger always required when tsvector is used?

Any help in understanding the difference between both would be appreciated.

Thanks, Nishit


Solution

  • You don't describe how you create your search_fields initially. It was not constructed correctly. Since we don't know what you did, we don't know what you did wrong. If you rebuild it correctly, then it will start working. When you changed default_text_search_config to 'simple', you appear to have correctly repopulated the search_fields, which is why it worked. If you change back to 'english' and correctly repopulate the search_fields then it will also work.

    You don't always need a trigger. A trigger is one way. Another way is to just manually update the tsvector column every time you update the text column. My usual favorite way is not to store the tsvector at all, and just derive it on the fly:

    select id, name, search_fields from test where 
        to_tsvector('english',name) @@ to_tsquery('english','ongoing');
    

    If you want to do it this way, you need to specify the configuration, not rely on default_text_search_config, otherwise the expressional gin index will not be used. Also, this way is not a good idea if you want to use phrase searching, as the rechecking will be slow.