Search code examples
sqlpostgresqlfull-text-searchunnesttsvector

Select lexemes from unnested ts_vectors


I try to select only lexemes from unnested ts_vector column:

select lexeme
from 
    (select unnest(to_tsvector('russian', description))
     from cards) as roots;

But it doesn't work, because SQL doesn't know anything about lexeme column. How can I select only lexemes from unnested ts_vectors?


Solution

  • What you found yourself:

    SELECT (unnest(to_tsvector(description))).lexeme
    FROM   cards;
    

    The equivalent standard SQL form with the set-returning function in the FROM list is slightly more verbose, but easier to integrate in bigger queries:

    SELECT d.lexeme
    FROM   cards c
    LEFT   JOIN LATERAL unnest(to_tsvector(c.description))) d ON true;
    

    Related:

    Why? How?

    Since Postgres 9.6 there is a second "overloaded" variant of unnest(). Quoting the release notes:

    • Add new functions for tsvector data (Stas Kelvich)

      The new functions are ts_delete(), ts_filter(), unnest(), tsvector_to_array(), array_to_tsvector(), and a variant of setweight() that sets the weight only for specified lexeme(s).

    Bold emphasis mine.

    See:

    SELECT proname, proargtypes::regtype[], prorettype::regtype
    FROM   pg_proc
    where  proname = 'unnest';
    
    proname | proargtypes      | prorettype
    --------+------------------+-----------
    unnest  | [0:0]={anyarray} | anyelement
    unnest  | [0:0]={tsvector} | record    
    (2 rows)
    

    db<>fiddle here

    The function is documented in the manual among text search functions:

    unnest ( tsvector ) → setof record ( lexeme text, positions smallint[], weights text )

    It returns setof record with named output columns. Hence we can refer to the column lexeme directly like we did.