Search code examples
postgresqlfull-text-searchpsqlinverted-index

PostgreSQL full-text index search syntax issues between different versions


We had an issue a couple of weeks back where our text-index query wasn't yielding any results. This was the expression:

"test_column" @@ websearch_to_tsquery('english', "in_search_argument")

We had to update it to:

"test_column_idx" @@ websearch_to_tsquery('english', "in_search_argument")

..where test_column_idx is defined as:

"test_column_idx" gin (to_tsvector('english'::regconfig, 'test_column'::text))

Our server is 14.6 (with psql 12.16).

However, my development environment complains about this:

sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn) column p.test_column_idx does not exist

If I change it back to use the column (see above; it matched just fine in our development environment even if our production didn't), it works again. Our development environment uses server 14.9 (we closed the gap between versions while debugging) and psql 15.4.

What syntax should I be using to maintain compatibility between environments?

Added (thanks @RichardHuxton):

So, SQLAlchemy creates it as an index in development

"test_column_idx" gin (to_tsvector('english'::regconfig, 'test_column'::text))

The SQLAlchemy definition looks like:

sqlalchemy.Index(
    'test_column_idx',

    sqlalchemy.sql.func.to_tsvector(
        sqlalchemy.literal(_TEXT_INDEX_LANGUAGE),
        'test_column'),

    postgresql_using='gin'),

When I actually deployed it, I had manually added it to production as a column:

test_column_idx      | tsvector          |           |          | generated always as (to_tsvector('english'::regconfig, test_column::text)) stored

So, can anyone suggest how I might adjust the SQLAlchemy definition to create the column instead?

That said, I'm confused how index semantics can be installed as both a column and an index, and the situations where one or the other would be preferred. I could use some clarification.

Thank you.


Solution

  • The issue was simple, but the proper flow is also deceptively simple and what I did subtly deviated.

    Somehow we had ended-up with a tsvector column and no index, our search function used this column, and to make matters more confusing, I had inadvertently named this column the intended name of the index (idx_product_1). So, when I was looking for bugs and reviewing the definition of the search function, I had missed this the first couple of times.

    Since what we had worked exactly as intended (see where the issue was, below), we were even more confused. We finally figured-out that:

    1. The vector column is created to auto-populate from the normal data
    2. The index is created from the vector column
    3. The query is executed against the vector column (not against the index directly, which would not be normal but I was getting my wires crossed; see previous comments/answers)
    4. The vector column will transparently invoke the index (like normal) if one is found
    5. We don't need the index for the right results. We just need it to optimize the search.
    6. We do need to perform the text queries against the vector column because, even though the text-query against the original column won't fail, we'll get incorrect results when doing so (not usually wrong, but usually incomplete).

    Therefore, we did this to autopopulate the tsvector column:

    ALTER TABLE "products" ADD "style_name_vector" tsvector GENERATED ALWAYS AS (
         to_tsvector('simple', "internal_sku") || ' ' ||
         to_tsvector('english', "human_brand") || ' ' ||
         to_tsvector('simple', "style_id") || ' ' ||
         to_tsvector('english', "style_name") || ' ' ||
         to_tsvector('english', "gender") || ' ' ||
         to_tsvector('simple', shopify_product_id::varchar(255))
    ) STORED;
    

    Because the tsvector column is a space-separated list of word-vector representations, we can simply concatenate a bunch of vectored columns in order to search all of those columns simultaneously. This was an optimization not in the previous design.

    Create the index:

    CREATE INDEX "idx_product_1" ON "products" USING GIN("style_name_vector");
    

    Do a search (just for a complete example):

    analytics=> select internal_sku, style_name from products where style_name_vector @@ websearch_to_tsquery('english', 'trail') limit 5;
     internal_sku |               style_name               
    --------------+----------------------------------------
     TN100000926  | Mens Paramount Trail Convertible Pants
     RE000127387  | Trail 40
     PA000010810  | Womens Capilene Cool Trail Shirt
     SA000023516  | S Lab trail Running Shirt
     MA100000254  | Womens Vapor Trail Hoodie