Search code examples
javasqlpostgresqlpgadmin-4tsvector

plainto_tsquery in english config is stemming the given words into incorrect root words. Ex : "create" -> "creat", "scheduled" -> "schedul"


I am trying to convert the given query word into a ts_query which I can use in the where condition for select query. I am not getting the records in query as this plainto_tsquery is giving the invalid root word for English config.

Database demo table :

id ts_vector
1 create:1,test:2
3 schedule:1,test:2

Query : select * from demo where ts_vector @@ plainto_tsquery("english","create")

0 records

Query : select * from demo where ts_vector @@ plainto_tsquery("simple","create")

id ts_vector
1 create:1,test:2

found that the issue is in the plainto_tsquery with English config, it stems the given words as below. so the query is returning 0 records as there are no records with the given root words in ts_vector.

select plainto_tsquery("English","create")

'creat'

select plainto_tsquery("simple","schedule")

'schedul'

Why is this happening? is 'creat' the root word for create in English? I have updated the Postgres dictionary but no change in the result.


Solution

  • That is working as expected.

    The English text search configuration uses the Snowball dictionary english_stem for natural language words:

    \dF+ english
    
    Text search configuration "pg_catalog.english"
    Parser: "pg_catalog.default"
          Token      │ Dictionaries 
    ═════════════════╪══════════════
     asciihword      │ english_stem
     asciiword       │ english_stem
     email           │ simple
     file            │ simple
     float           │ simple
     host            │ simple
     hword           │ english_stem
     hword_asciipart │ english_stem
     hword_numpart   │ simple
     hword_part      │ english_stem
     int             │ simple
     numhword        │ simple
     numword         │ simple
     sfloat          │ simple
     uint            │ simple
     url             │ simple
     url_path        │ simple
     version         │ simple
     word            │ english_stem
    

    The Snowball dictionary does not understand the language, so that it can stem correctly, but it uses heuristics for stemming that are usually good enough. However, "creat" is the correct stemming for "create": think of "creating".

    If you created the tsvector using the simple text search configuration, you cannot expect to find it with a tsquery that was generated with a different text search configuration.