Search code examples
postgresqlindexingsqlalchemyfull-text-searchflask-sqlalchemy

How do I make sqlalchemy/postgres use the GIN index in full text search


So, I have been using Postgres/sqlalchemy to do full text search for a while. No problems, but now it has started to become slow because number of items in table is starting to become high (like 600k items, which takes ~10+ seconds). So, decided to look into indexing with gin.

I am able to create the index ok.

Problem is, it is not being used (0 scans when I look in pgAdmin), so I am wondering what I am doing wrong, or if I have to force use of the index or something.


My table prior to indexing (flask-sqlalchemy syntax):

class Parent_product(db.Model):
    id = ....
    name = db.Column(db.String) 

And I would be doing searches like this (disregard legacy sqlalchemy syntax):

query = db.session.query(Parent_product)
query = query.order_by(func.similarity(Parent_product.name, search_string).desc())
query = query.filter(func.similarity(Parent_product.name, search_string) > 0.1)
search_results = query.limit(5).all()

Then, I started the experimenting and added an index in the Parent_product class like this:

__table_args__ = (
    (
    Index(
        'ix_parent_product_name', 
        name,
        postgresql_using="gin",
        postgresql_ops={
            'name': 'gin_trgm_ops',
        }
    )
    ),
)

Which works fine, and I can see that the index has been built in the DB, but as stated above, it is not being used in the similarity-query.


Solution

  • Ok, I think I figured it out. Apparently, you need to use so called operators to get postgres to use your index. The func.similarity is not an operator, meaning I had to swap it out in the search query like this:

    db.session.execute('SET pg_trgm.similarity_threshold = 0.1;')
    query = db.session.query(Parent_product)
    query = query.filter(Parent_product.name.op("%")(search))
    query = query.order_by(Parent_product.name.op("%>")(search).desc())
    

    Now I can see the index being used. Not exactly sure what I am doing, but it works pretty fast.

    And these are the "operators" available in postgres. Don't know what half of it means, but for reference:

    https://www.postgresql.org/docs/current/pgtrgm.html#PGTRGM-OP-TABLE