Search code examples
postgresqlfull-text-search

Postgres Full-Text Search with Hyphen and Numerals


I have observed what seems to me an odd behavior of Postgres' to_tsvector function.

SELECT to_tsvector('english', 'abc-xyz');

returns

'abc':2 'abc-xyz':1 'xyz':3

However,

SELECT to_tsvector('english', 'abc-001');

returns

'-001':2 'abc':1

Why not something like this?

'abc':2 'abc-001':1 '001':3

And what should I do to be able to search by the numeric portion alone, without the hyphen?


Solution

  • This can be circumvented with PG13's dict-int addon's absval option. See the official documentation.

    But in case you're stuck with an earlier PG version, here's the generalized version of a "number or negative number" workaround in a query.

    select regexp_replace($$'test' & '1':* & '2'$$::tsquery::text,
                '''([.\d]+''(:\*)?)', '(''\1 | ''-\1)', 'g')::tsquery;
    

    This results in:

    'test' & ( '1':* | '-1':* ) & ( '2' | '-2' )
    

    It replaces lexemes that look like positive numbers with "number or negative number" kind of subqueries.
    The double cast ::tsquery::text is just there to show how you would pass a tsquery casted to text.
    Note that it handles prefix matching numeric lexemes as well.