So I want to do some fulltext searching, looked up how to do it with sqlite, implemented that and figured it's to slow. So I chose postgresql as I have been using that database anyway in the last years.
I have a pretty simple table + trigger and an index:
CREATE TABLE submission
(
id TEXT NOT NULL,
subreddit_id TEXT NOT NULL,
author TEXT,
created_utc INTEGER,
title TEXT NOT NULL,
selftext TEXT,
permalink TEXT,
num_comments INTEGER,
score FLOAT,
upvote_ratio FLOAT,
fts TSVECTOR,
UNIQUE (id, subreddit_id)
);
CREATE FUNCTION submission_fts_trigger() RETURNS trigger AS
$$
BEGIN
new.fts :=
setweight(to_tsvector('pg_catalog.english', new.title), 'A') ||
setweight(to_tsvector('pg_catalog.english', new.selftext), 'B');
return new;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_submission_idx_fts_update
BEFORE INSERT OR UPDATE
ON submission
FOR EACH ROW
EXECUTE FUNCTION submission_fts_trigger();
CREATE INDEX search_idx_fts_idx ON submission USING GIN (fts);
sql query for search
"SELECT * FROM submission, plainto_tsquery('foostring') AS q WHERE (fts @@ q);"
Now while it works it's 8 time slower than the sqlite solution and this is kind of weird, I would have expected a different outcome. I also looked around and it seems the solution above should be fast according to some blog posts. Currently the data I have in the table are around 1500 rows.
One search in postgresql takes around 32 ms and one search in sqlite 4 ms.
Am I missing something? Is there a way to make search faster in postgresql? Will sqlite get slower if I add more data to the table?
Would something like elasticsearch be faster?
edit The result of explain(analyze,buffers) is:
Nested Loop (cost=12.31..36.12 rows=7 width=876) (actual time=2.517..2.518 rows=1 loops=1)
Buffers: shared hit=34
-> Function Scan on plainto_tsquery q (cost=0.25..0.26 rows=1 width=32) (actual time=2.472..2.472 rows=1 loops=1)
Buffers: shared hit=29
-> Bitmap Heap Scan on submission (cost=12.06..35.79 rows=7 width=844) (actual time=0.040..0.040 rows=1 loops=1)
Recheck Cond: (fts @@ q.q)
Heap Blocks: exact=1
Buffers: shared hit=5
-> Bitmap Index Scan on search_idx_fts_idx (cost=0.00..12.05 rows=7 width=0) (actual time=0.029..0.029 rows=1 loops=1)
Index Cond: (fts @@ q.q)
Buffers: shared hit=4
Planning:
Buffers: shared hit=188
Execution Time: 2.611 ms
Edit 2 So, the measurements I took were from the function calling the select statement, which is the same for both sqlite and postgresql. Thanks to the analyzer I can see that postgresql is fast enough and somewhere the jdbc wrapper I am using seems to add a lot of overhead.
Thanks for the pointer.
Thanks to the comments here it was clear that it's not the database fault. So I did some digging and finally remembered that I did not add connection pooling yet and instead opened a new SQL connection for every database request.
It turns out, doing that, needs 10 times more time than using a connection pool.
With a connection pool in place postgresql full text search is slightly faster, around 10% for my setting right now.