Search code examples
pythonpython-3.xsqlalchemyfastapi

How to use MySQL FULLTEXT search, index and match against in SQLAlchemy ORM?


I am currently working on a project using FastAPI and MySQL as the database. And now, I need to implement a full-text index search functionality instead of using the ilike operator.

I have this table

class Publication(ORMBase):

    __tablename__ = 'publications'
    
    id = Column(Integer, primary_key=True, index=True)
    paperId = Column(String(200), nullable=False, index=True, unique=True)
    url = Column(Text, nullable=True)
    title = Column(String(255), nullable=False)
    abstract = Column(Text(length=65535), nullable=True)
    venue = Column(String(255), nullable=True)
    authors = Column(JSON, nullable=True)

and I am using this query as a base.

query.where(
    models.Publication.title.ilike("%{}%".format(search_query)) |
    models.Publication.abstract.ilike("%{}%".format(search_query)) |
    models.Publication.venue.ilike("%{}%".format(search_query)) |
    models.Publication.authors.ilike("%{}%".format(search_query))
)

What can I do from here?

I have noticed here that the answers in questions besides of being old are also related to Table and not ORM and Postgresql. Chat GPT-3 only produces errors for me.

I tried using the usual index=True, but that obviously doesn't work: sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (1191, "Can't find FULLTEXT index matching the column list").

In the most cases I was getting maximum recursion depth exceeded error.

I would appreciate any guidance, examples or suggestions on how to structure the database schema and perform the search query.

Thank you


Solution

  • MySQL full text search

    Full-text indexes can be used only with InnoDB or MyISAM tables, and can be created only for CHAR, VARCHAR, or TEXT columns. So, not for JSON I guess (but that can be easily accommodated).

    I'm using SQLAlchemy 2.0 for my example, but the basics is the same.

    To create a full text index with MySQL, add this event listener to your code:

    event.listen(
        Base.metadata,
        "after_create",
        DDL(
            "ALTER TABLE publications "
            "ADD FULLTEXT INDEX `fulltext_publications` "
            "(`title`, `abstract`, `venue`)"
        ),
    )
    

    This will add the FULLTEXT index after your table is created. You can also just execute a raw query to create the FULLTEXT index.

    After you got data, you can use a text query to perform a FULLTEXT search in MySQL:

        # query search phrase:
        q = "apple"
    
        with Session() as session:
            stmt = (
                text(
                    "SELECT id,"
                    "       MATCH (title,abstract,venue) AGAINST (:value) AS score "
                    "FROM publications "
                    "ORDER BY score DESC"
                )
                .bindparams(value=q)
            )
    
            for row in session.execute(stmt):
                print(row.id, row.score)
    

    Full code example for your reference:

    from typing import Annotated
    
    from sqlalchemy import String, create_engine, event, text
    from sqlalchemy.dialects.mysql import JSON
    from sqlalchemy.orm import (DeclarativeBase, Mapped, MappedAsDataclass,
                                mapped_column, sessionmaker)
    from sqlalchemy.schema import DDL
    from sqlalchemy.types import TEXT
    
    engine = create_engine(
        "mysql+pymysql://root:password@localhost/test",
        echo=True,
        pool_recycle=600,
    )
    
    Session = sessionmaker(
        bind=engine,
    )
    
    
    class Base(DeclarativeBase):
        pass
    
    
    str_255 = Annotated[
        str,
        mapped_column(
            String(255),
            nullable=True,
        )
    ]
    
    
    class Publication(Base):
    
        __tablename__ = 'publications'
    
        id: Mapped[int] = mapped_column(primary_key=True)
    
        paperId: Mapped[str_255] = mapped_column(
            index=True,
            unique=True,
            nullable=False,
        )
        url: Mapped[str] = mapped_column(TEXT, nullable=True)
    
        # full text search columns
        title: Mapped[str_255] = mapped_column(nullable=False)
        abstract: Mapped[str] = mapped_column(TEXT, nullable=True)
        venue: Mapped[str_255]
        authors: Mapped[str] = mapped_column(JSON, nullable=True)
    
    
    event.listen(
        Base.metadata,
        "after_create",
        DDL(
            "ALTER TABLE publications "
            "ADD FULLTEXT INDEX `fulltext_publications` "
            "(`title`, `abstract`, `venue`)"
        ),
    )
    
    
    def full_text_search_on_publications(q: str):
        with Session() as session:
            stmt = (
                text(
                    "SELECT id,"
                    "       MATCH (title,abstract,venue) AGAINST (:value) AS score "
                    "FROM publications "
                    "ORDER BY score DESC"
                )
                .bindparams(value=q)
            )
    
            for row in session.execute(stmt):
                print(row.id, row.score)
    
    
    if __name__ == "__main__":
        # Base.metadata.create_all(engine)
        full_text_search_on_publications("apple")
    
    

    My sample data in publications:

    |id |paperId|url|title     |abstract                   |venue|authors         |
    |---|-------|---|----------|---------------------------|-----|----------------|
    |1  |1      |   |apple tree|An apple tree grows apples!|aaa  |["alice"]       |
    |2  |2      |   |b         |banana                     |bbb  |["bob", "cathy"]|
    |3  |3      |   |apple     |nothing                    |ccc  |["alice"]       |
    

    My output:

    1 0.062016263604164124
    3 0.031008131802082062
    2 0.0