So, I have been using Postgres/sqlalchemy to do full text search for a while. No problems, but now it has started to become slow because number of items in table is starting to become high (like 600k items, which takes ~10+ seconds). So, decided to look into indexing with gin.
I am able to create the index ok.
Problem is, it is not being used (0 scans when I look in pgAdmin), so I am wondering what I am doing wrong, or if I have to force use of the index or something.
My table prior to indexing (flask-sqlalchemy syntax):
class Parent_product(db.Model):
id = ....
name = db.Column(db.String)
And I would be doing searches like this (disregard legacy sqlalchemy syntax):
query = db.session.query(Parent_product)
query = query.order_by(func.similarity(Parent_product.name, search_string).desc())
query = query.filter(func.similarity(Parent_product.name, search_string) > 0.1)
search_results = query.limit(5).all()
Then, I started the experimenting and added an index in the Parent_product class like this:
__table_args__ = (
(
Index(
'ix_parent_product_name',
name,
postgresql_using="gin",
postgresql_ops={
'name': 'gin_trgm_ops',
}
)
),
)
Which works fine, and I can see that the index has been built in the DB, but as stated above, it is not being used in the similarity-query.
Ok, I think I figured it out. Apparently, you need to use so called operators to get postgres to use your index. The func.similarity
is not an operator, meaning I had to swap it out in the search query like this:
db.session.execute('SET pg_trgm.similarity_threshold = 0.1;')
query = db.session.query(Parent_product)
query = query.filter(Parent_product.name.op("%")(search))
query = query.order_by(Parent_product.name.op("%>")(search).desc())
Now I can see the index being used. Not exactly sure what I am doing, but it works pretty fast.
And these are the "operators" available in postgres. Don't know what half of it means, but for reference:
https://www.postgresql.org/docs/current/pgtrgm.html#PGTRGM-OP-TABLE