Search code examples
postgresqlsqlitefull-text-search

Postgresql full text search 8 times slower than sqlite fts search


So I want to do some fulltext searching, looked up how to do it with sqlite, implemented that and figured it's to slow. So I chose postgresql as I have been using that database anyway in the last years.

I have a pretty simple table + trigger and an index:

CREATE TABLE submission
(
    id           TEXT NOT NULL,
    subreddit_id TEXT NOT NULL,
    author       TEXT,
    created_utc  INTEGER,
    title        TEXT NOT NULL,
    selftext     TEXT,
    permalink    TEXT,
    num_comments INTEGER,
    score        FLOAT,
    upvote_ratio FLOAT,
    fts          TSVECTOR,
    UNIQUE (id, subreddit_id)
);
CREATE FUNCTION submission_fts_trigger() RETURNS trigger AS
$$
BEGIN
    new.fts :=
                setweight(to_tsvector('pg_catalog.english', new.title), 'A') ||
                setweight(to_tsvector('pg_catalog.english', new.selftext), 'B');
    return new;
END;
$$ LANGUAGE plpgsql;
    
CREATE TRIGGER trg_submission_idx_fts_update
    BEFORE INSERT OR UPDATE
    ON submission
    FOR EACH ROW
EXECUTE FUNCTION submission_fts_trigger();

CREATE INDEX search_idx_fts_idx ON submission USING GIN (fts);

sql query for search
"SELECT * FROM submission, plainto_tsquery('foostring') AS q WHERE (fts @@ q);"

Now while it works it's 8 time slower than the sqlite solution and this is kind of weird, I would have expected a different outcome. I also looked around and it seems the solution above should be fast according to some blog posts. Currently the data I have in the table are around 1500 rows.

One search in postgresql takes around 32 ms and one search in sqlite 4 ms.

Am I missing something? Is there a way to make search faster in postgresql? Will sqlite get slower if I add more data to the table?

Would something like elasticsearch be faster?

edit The result of explain(analyze,buffers) is:

Nested Loop  (cost=12.31..36.12 rows=7 width=876) (actual time=2.517..2.518 rows=1 loops=1)
   Buffers: shared hit=34
  ->  Function Scan on plainto_tsquery q  (cost=0.25..0.26 rows=1 width=32) (actual time=2.472..2.472 rows=1 loops=1)   
        Buffers: shared hit=29 
  ->  Bitmap Heap Scan on submission  (cost=12.06..35.79 rows=7 width=844) (actual time=0.040..0.040 rows=1 loops=1)
        Recheck Cond: (fts @@ q.q)
        Heap Blocks: exact=1
        Buffers: shared hit=5
        ->  Bitmap Index Scan on search_idx_fts_idx  (cost=0.00..12.05 rows=7 width=0) (actual time=0.029..0.029 rows=1 loops=1)
              Index Cond: (fts @@ q.q)
              Buffers: shared hit=4
Planning:
 Buffers: shared hit=188
Execution Time: 2.611 ms

Edit 2 So, the measurements I took were from the function calling the select statement, which is the same for both sqlite and postgresql. Thanks to the analyzer I can see that postgresql is fast enough and somewhere the jdbc wrapper I am using seems to add a lot of overhead.
Thanks for the pointer.


Solution

  • Thanks to the comments here it was clear that it's not the database fault. So I did some digging and finally remembered that I did not add connection pooling yet and instead opened a new SQL connection for every database request.

    It turns out, doing that, needs 10 times more time than using a connection pool.

    With a connection pool in place postgresql full text search is slightly faster, around 10% for my setting right now.