I am looking into PostgreSQL full text search and would like to know if it is possible to search for the second part of compound words.
-- Lets have a table like this:
CREATE TABLE IF NOT EXISTS table1(
id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
col1 TEXT,
col1_tsv TSVECTOR
);
CREATE INDEX IF NOT EXISTS col1_index ON table1 USING gin(col1_tsv);
-- Insert some data into it:
INSERT INTO table1 (col1, col1_tsv)
VALUES ('Blacklist', TO_TSVECTOR('Blacklist')),('Cheesecake', TO_TSVECTOR('Cheesecake'));
If I search for 'cake' or 'list' I get no results.
SELECT col1 FROM table1 WHERE col1_tsv @@ to_tsquery('english', 'list');
SELECT col1 FROM table1 WHERE col1_tsv @@ to_tsquery('english', 'cake');
Check it with ts_lexize:
select ts_lexize('english_stem','Blacklist');
select ts_lexize('english_stem','Cheesecake');
The output:
ts_lexize
-------------
{blacklist}
(1 row)
ts_lexize
-------------
{cheesecak}
(1 row)
Works as designed but is there a way to get the Cheesecake just from searching for cake? (And I do not mean)
select * from table1 where col1 like '%cake%';
When I select the whole table the Cheesecake is also cut into Cheesecak.
select * from table1;
id | col1 | col1_tsv
----+------------+---------------
1 | Blacklist | 'blacklist':1
2 | Cheesecake | 'cheesecak':1
Full text search is capable of prefix matching. See:
But that only works for left-anchored searches. Your pattern is right-anchored.
You could build an index on reversed strings and search with reversed patterns:
CREATE INDEX table1_col1_rtsv_idx ON table1 USING gin (TO_TSVECTOR('simple', reverse(col1)));
Then this query with prefix search can use the new index:
SELECT col1 FROM table1
WHERE to_tsvector('simple', reverse(col1))
@@ to_tsquery('simple', reverse('cake') || ':*');
But I would consider a trigram index instead. See:
CREATE INDEX table1_col1_gin_trgm_idx ON table1 USING gin (col1 gin_trgm_ops);
Query:
SELECT col1 FROM table1
WHERE col1 LIKE '%cake';
Notably, the pattern is '%cake'
, not '%cake%'
, if "cake" shall be at the end of the string. But the trigram index supports this just as well:
SELECT col1 FROM table1
WHERE col1 LIKE '%cake%';
db<>fiddle here
Text search indexes are typically substantially smaller than trigram indexes - and therefore a bit faster. There are lots of other subtle differences ...