I am currently working on a project using FastAPI and MySQL as the database. And now, I need to implement a full-text index search functionality instead of using the ilike
operator.
I have this table
class Publication(ORMBase):
__tablename__ = 'publications'
id = Column(Integer, primary_key=True, index=True)
paperId = Column(String(200), nullable=False, index=True, unique=True)
url = Column(Text, nullable=True)
title = Column(String(255), nullable=False)
abstract = Column(Text(length=65535), nullable=True)
venue = Column(String(255), nullable=True)
authors = Column(JSON, nullable=True)
and I am using this query as a base.
query.where(
models.Publication.title.ilike("%{}%".format(search_query)) |
models.Publication.abstract.ilike("%{}%".format(search_query)) |
models.Publication.venue.ilike("%{}%".format(search_query)) |
models.Publication.authors.ilike("%{}%".format(search_query))
)
What can I do from here?
I have noticed here that the answers in questions besides of being old are also related to Table and not ORM and Postgresql. Chat GPT-3 only produces errors for me.
I tried using the usual index=True
, but that obviously doesn't work:
sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (1191, "Can't find FULLTEXT index matching the column list").
In the most cases I was getting maximum recursion depth exceeded error.
I would appreciate any guidance, examples or suggestions on how to structure the database schema and perform the search query.
Thank you
Full-text indexes can be used only with InnoDB or MyISAM tables, and can be
created only for CHAR
, VARCHAR
, or TEXT
columns. So, not for JSON
I
guess (but that can be easily accommodated).
I'm using SQLAlchemy 2.0 for my example, but the basics is the same.
To create a full text index with MySQL, add this event listener to your code:
event.listen(
Base.metadata,
"after_create",
DDL(
"ALTER TABLE publications "
"ADD FULLTEXT INDEX `fulltext_publications` "
"(`title`, `abstract`, `venue`)"
),
)
This will add the FULLTEXT index after your table is created. You can also just execute a raw query to create the FULLTEXT index.
After you got data, you can use a text query to perform a FULLTEXT search in MySQL:
# query search phrase:
q = "apple"
with Session() as session:
stmt = (
text(
"SELECT id,"
" MATCH (title,abstract,venue) AGAINST (:value) AS score "
"FROM publications "
"ORDER BY score DESC"
)
.bindparams(value=q)
)
for row in session.execute(stmt):
print(row.id, row.score)
Full code example for your reference:
from typing import Annotated
from sqlalchemy import String, create_engine, event, text
from sqlalchemy.dialects.mysql import JSON
from sqlalchemy.orm import (DeclarativeBase, Mapped, MappedAsDataclass,
mapped_column, sessionmaker)
from sqlalchemy.schema import DDL
from sqlalchemy.types import TEXT
engine = create_engine(
"mysql+pymysql://root:password@localhost/test",
echo=True,
pool_recycle=600,
)
Session = sessionmaker(
bind=engine,
)
class Base(DeclarativeBase):
pass
str_255 = Annotated[
str,
mapped_column(
String(255),
nullable=True,
)
]
class Publication(Base):
__tablename__ = 'publications'
id: Mapped[int] = mapped_column(primary_key=True)
paperId: Mapped[str_255] = mapped_column(
index=True,
unique=True,
nullable=False,
)
url: Mapped[str] = mapped_column(TEXT, nullable=True)
# full text search columns
title: Mapped[str_255] = mapped_column(nullable=False)
abstract: Mapped[str] = mapped_column(TEXT, nullable=True)
venue: Mapped[str_255]
authors: Mapped[str] = mapped_column(JSON, nullable=True)
event.listen(
Base.metadata,
"after_create",
DDL(
"ALTER TABLE publications "
"ADD FULLTEXT INDEX `fulltext_publications` "
"(`title`, `abstract`, `venue`)"
),
)
def full_text_search_on_publications(q: str):
with Session() as session:
stmt = (
text(
"SELECT id,"
" MATCH (title,abstract,venue) AGAINST (:value) AS score "
"FROM publications "
"ORDER BY score DESC"
)
.bindparams(value=q)
)
for row in session.execute(stmt):
print(row.id, row.score)
if __name__ == "__main__":
# Base.metadata.create_all(engine)
full_text_search_on_publications("apple")
My sample data in publications:
|id |paperId|url|title |abstract |venue|authors |
|---|-------|---|----------|---------------------------|-----|----------------|
|1 |1 | |apple tree|An apple tree grows apples!|aaa |["alice"] |
|2 |2 | |b |banana |bbb |["bob", "cathy"]|
|3 |3 | |apple |nothing |ccc |["alice"] |
My output:
1 0.062016263604164124
3 0.031008131802082062
2 0.0