Search code examples
postgresqlfull-text-searchtsvector

Lexeme position in tsvector


I have the following lines of text:

"Blue pill"; "Red pill"; "Blue shift"; "Red eye".

I want to select rows, where Red is the first word or where Pill is the second. Hypothetically, it can be done with use of tsquery and tsvector, since output of tsvector contains also positions of each lexeme. However I didn't find any function allowing to access vectors lexemes by their numbers. Is there any proper way of selecting the rows, matching ts_query at the defined positions?


Solution

  • It is possible to do this with tsvector:

    with data as (
      select * from (
      VALUES (1, 'Blue pill'),
             (2, 'Red pill'),
             (3, 'Blue shift'),
             (4, 'Red eye')
      ) v(id, t)
    )
    select id, lexeme, positions
    FROM data
    CROSS JOIN unnest(to_tsvector(t)) u(lexeme, positions, weights)
    WHERE (lexeme = 'red' and positions @> '{1}')
    OR (lexeme = 'pill' and positions @> '{2}');
     id | lexeme | positions
    ----+--------+-----------
      1 | pill   | {2}
      2 | pill   | {2}
      2 | red    | {1}
      4 | red    | {1}
    (4 rows)
    
    

    I think it's likely easier to do this with a regular expression, though.