How to use MySQL FULLTEXT search, index and match against in SQLAlchemy ORM?

I am currently working on a project using FastAPI and MySQL as the database. And now, I need to implement a full-text index search functionality instead of using the ilike operator.

I have this table

class Publication(ORMBase):

    __tablename__ = 'publications'
    
    id = Column(Integer, primary_key=True, index=True)
    paperId = Column(String(200), nullable=False, index=True, unique=True)
    url = Column(Text, nullable=True)
    title = Column(String(255), nullable=False)
    abstract = Column(Text(length=65535), nullable=True)
    venue = Column(String(255), nullable=True)
    authors = Column(JSON, nullable=True)

and I am using this query as a base.

query.where(
    models.Publication.title.ilike("%{}%".format(search_query)) |
    models.Publication.abstract.ilike("%{}%".format(search_query)) |
    models.Publication.venue.ilike("%{}%".format(search_query)) |
    models.Publication.authors.ilike("%{}%".format(search_query))
)

What can I do from here?

I have noticed here that the answers in questions besides of being old are also related to Table and not ORM and Postgresql. Chat GPT-3 only produces errors for me.

I tried using the usual index=True, but that obviously doesn't work: sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (1191, "Can't find FULLTEXT index matching the column list").

In the most cases I was getting maximum recursion depth exceeded error.

I would appreciate any guidance, examples or suggestions on how to structure the database schema and perform the search query.

Thank you

Solution

MySQL full text search

Full-text indexes can be used only with InnoDB or MyISAM tables, and can be created only for CHAR, VARCHAR, or TEXT columns. So, not for JSON I guess (but that can be easily accommodated).

I'm using SQLAlchemy 2.0 for my example, but the basics is the same.

To create a full text index with MySQL, add this event listener to your code:

event.listen(
    Base.metadata,
    "after_create",
    DDL(
        "ALTER TABLE publications "
        "ADD FULLTEXT INDEX `fulltext_publications` "
        "(`title`, `abstract`, `venue`)"
    ),
)

This will add the FULLTEXT index after your table is created. You can also just execute a raw query to create the FULLTEXT index.

After you got data, you can use a text query to perform a FULLTEXT search in MySQL:

    # query search phrase:
    q = "apple"

    with Session() as session:
        stmt = (
            text(
                "SELECT id,"
                "       MATCH (title,abstract,venue) AGAINST (:value) AS score "
                "FROM publications "
                "ORDER BY score DESC"
            )
            .bindparams(value=q)
        )

        for row in session.execute(stmt):
            print(row.id, row.score)

Full code example for your reference:

from typing import Annotated

from sqlalchemy import String, create_engine, event, text
from sqlalchemy.dialects.mysql import JSON
from sqlalchemy.orm import (DeclarativeBase, Mapped, MappedAsDataclass,
                            mapped_column, sessionmaker)
from sqlalchemy.schema import DDL
from sqlalchemy.types import TEXT

engine = create_engine(
    "mysql+pymysql://root:password@localhost/test",
    echo=True,
    pool_recycle=600,
)

Session = sessionmaker(
    bind=engine,
)


class Base(DeclarativeBase):
    pass


str_255 = Annotated[
    str,
    mapped_column(
        String(255),
        nullable=True,
    )
]


class Publication(Base):

    __tablename__ = 'publications'

    id: Mapped[int] = mapped_column(primary_key=True)

    paperId: Mapped[str_255] = mapped_column(
        index=True,
        unique=True,
        nullable=False,
    )
    url: Mapped[str] = mapped_column(TEXT, nullable=True)

    # full text search columns
    title: Mapped[str_255] = mapped_column(nullable=False)
    abstract: Mapped[str] = mapped_column(TEXT, nullable=True)
    venue: Mapped[str_255]
    authors: Mapped[str] = mapped_column(JSON, nullable=True)


event.listen(
    Base.metadata,
    "after_create",
    DDL(
        "ALTER TABLE publications "
        "ADD FULLTEXT INDEX `fulltext_publications` "
        "(`title`, `abstract`, `venue`)"
    ),
)


def full_text_search_on_publications(q: str):
    with Session() as session:
        stmt = (
            text(
                "SELECT id,"
                "       MATCH (title,abstract,venue) AGAINST (:value) AS score "
                "FROM publications "
                "ORDER BY score DESC"
            )
            .bindparams(value=q)
        )

        for row in session.execute(stmt):
            print(row.id, row.score)


if __name__ == "__main__":
    # Base.metadata.create_all(engine)
    full_text_search_on_publications("apple")

My sample data in publications:

|id |paperId|url|title     |abstract                   |venue|authors         |
|---|-------|---|----------|---------------------------|-----|----------------|
|1  |1      |   |apple tree|An apple tree grows apples!|aaa  |["alice"]       |
|2  |2      |   |b         |banana                     |bbb  |["bob", "cathy"]|
|3  |3      |   |apple     |nothing                    |ccc  |["alice"]       |

My output:

1 0.062016263604164124
3 0.031008131802082062
2 0.0