Search code examples
performancepostgresqlindexingfull-text-searchispell

Postgres tsearch performance on first query


We are using a custom text search configuration for searching in german texts for a proper support of compound words.

The dictionary can be found here: http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ (ispell-german-compound.tar.gz).

The dicts got converted to UTF8 and I used the following script to add the configuration to the database:

DROP TEXT SEARCH DICTIONARY IF EXISTS german_bon_ispell CASCADE;
DROP TEXT SEARCH DICTIONARY IF EXISTS german_bon_stem CASCADE;

CREATE TEXT SEARCH CONFIGURATION german_bon (copy=german);

CREATE TEXT SEARCH DICTIONARY german_bon_stem (
TEMPLATE = snowball,
  Language = german,
  StopWords = german
);

CREATE TEXT SEARCH DICTIONARY german_bon_ispell (
TEMPLATE = ispell,
  dictfile = german,
  afffile = german,
  StopWords = german
);

ALTER TEXT SEARCH CONFIGURATION german_bon
  ALTER MAPPING FOR
    asciiword,word,numword,numhword,hword_asciipart,hword_part,hword_numpart
  WITH german_bon_ispell, german_bon_stem;

The dictionary itself works great, but on every new connection/session the first query using this configuration takes 1-2 secounds. Every following ~1-3ms.

This effect is also observable for the english dictionary, but not that drastic:

db=# \timing
Timing is on.
db=# select ts_debug('english', 'Book');
                               ts_debug
-----------------------------------------------------------------------
 (asciiword,"Word, all ASCII",Book,{english_stem},english_stem,{book})
(1 row)

Time: 6,977 ms
db=# select ts_debug('english', 'Book');
                               ts_debug
-----------------------------------------------------------------------
 (asciiword,"Word, all ASCII",Book,{english_stem},english_stem,{book})
(1 row)

Time: 2,258 ms
db=# select ts_debug('german_bon', 'Buch');
                                             ts_debug
---------------------------------------------------------------------------------------------------
 (asciiword,"Word, all ASCII",Buch,"{german_bon_ispell,german_bon_stem}",german_bon_ispell,{buch})
(1 row)

Time: 916,286 ms
db=# select ts_debug('german_bon', 'Buch');
                                             ts_debug
---------------------------------------------------------------------------------------------------
 (asciiword,"Word, all ASCII",Buch,"{german_bon_ispell,german_bon_stem}",german_bon_ispell,{buch})
(1 row)

Time: 1,240 ms
db=#

The only work-around I'm currently aware of is the use of persistent connections/connection pooling and we are using pgbouncer for that. But that introduces some other problems with out client (PHP>PDO>Doctrine) which looks like a caching issue.

Is there any way to reduce this "startup time"? It kind of looks like the configuration is loaded/created for each new connection what just doesn't seem reasonable.


Solution

  • It is known problem - loading ispell dictionary is slow (it is loaded every time when dictionary is used first time in session). One and good solution is session pooling. Other solution is using shared ispell dictionary - extension that was written by Tomas Vondra - shared_ispell, but I don't know how well some new version of PostgreSQL 9.2 and higher are supported - it is tested on 9.2. There can be problem with German language - it is tested with Czech lang.

    Other possibility is using German snowball dictionary - it should be significantly faster - but the result can be worse. Remove german_bon_ispell from your fulltext configuration.