Search code examples
sqldatabasepostgresqlindexingfull-text-search

Full text search for compound words


I am looking into PostgreSQL full text search and would like to know if it is possible to search for the second part of compound words.

It there a way to get 'Cheesecake' as a result when I search for 'cake'?

-- Lets have a table like this:
CREATE TABLE IF NOT EXISTS table1(
    id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    col1 TEXT,
    col1_tsv TSVECTOR
);
CREATE INDEX IF NOT EXISTS col1_index ON table1 USING gin(col1_tsv);
-- Insert some data into it:
INSERT INTO table1 (col1, col1_tsv)
VALUES ('Blacklist', TO_TSVECTOR('Blacklist')),('Cheesecake', TO_TSVECTOR('Cheesecake'));

If I search for 'cake' or 'list' I get no results.

SELECT col1 FROM table1 WHERE col1_tsv @@ to_tsquery('english', 'list');
SELECT col1 FROM table1 WHERE col1_tsv @@ to_tsquery('english', 'cake');

Check it with ts_lexize:

select ts_lexize('english_stem','Blacklist');
select ts_lexize('english_stem','Cheesecake');

The output:

  ts_lexize  
-------------
 {blacklist}
(1 row)

  ts_lexize  
-------------
 {cheesecak}
(1 row)

Works as designed but is there a way to get the Cheesecake just from searching for cake? (And I do not mean)

select * from table1 where col1 like '%cake%';

When I select the whole table the Cheesecake is also cut into Cheesecak.

select * from table1;
 id |    col1    |   col1_tsv    
----+------------+---------------
  1 | Blacklist  | 'blacklist':1
  2 | Cheesecake | 'cheesecak':1

Solution

  • Full text search is capable of prefix matching. See:

    But that only works for left-anchored searches. Your pattern is right-anchored.

    You could build an index on reversed strings and search with reversed patterns:

    CREATE INDEX table1_col1_rtsv_idx ON table1 USING gin (TO_TSVECTOR('simple', reverse(col1)));
    

    Then this query with prefix search can use the new index:

    SELECT col1 FROM table1
    WHERE  to_tsvector('simple', reverse(col1))
        @@ to_tsquery('simple', reverse('cake') || ':*');
    

    But I would consider a trigram index instead. See:

    CREATE INDEX table1_col1_gin_trgm_idx ON table1 USING gin (col1 gin_trgm_ops);
    

    Query:

    SELECT col1 FROM table1
    WHERE  col1 LIKE '%cake';
    

    Notably, the pattern is '%cake', not '%cake%', if "cake" shall be at the end of the string. But the trigram index supports this just as well:

    SELECT col1 FROM table1
    WHERE  col1 LIKE '%cake%';
    

    db<>fiddle here

    Text search indexes are typically substantially smaller than trigram indexes - and therefore a bit faster. There are lots of other subtle differences ...