Search code examples
pythonmysqlfull-text-searchsqlalchemy

Proper use of MySQL full text search with SQLAlchemy


I would like to be able to full text search across several text fields of one of my SQLAlchemy mapped objects. I would also like my mapped object to support foreign keys and transactions.

I plan to use MySQL to run the full text search. However, I understand that MySQL can only run full text search on a MyISAM table, which does not support transactions and foreign keys.

In order to accomplish my objective I plan to create two tables. My code will look something like this:

class User(Base):
    __tablename__ = 'users'
    id = Column(Integer, primary_key=True)
    name = Column(String(50))
    description = Column(Text)

users_myisam = Table('users_myisam', Base.metadata,
                     Column('id', Integer),
                     Column('name', String(50)),
                     Column('description', Text),
                     mysql_engine='MyISAM')

conn = Base.metadata.bind.connect()
conn.execute("CREATE FULLTEXT INDEX idx_users_ftxt \
              on users_myisam (name, description)")

Then, to search I will run this:

q = 'monkey'
ft_search = users_myisam.select("MATCH (name,description) AGAINST ('%s')" % q)
result = ft_search.execute()
for row in result: print row

This seems to work, but I have a few questions:

  1. Is my approach of creating two tables to solve my problem reasonable? Is there a standard/better/cleaner way to do this?

  2. Is there a SQLAlchemy way to create the fulltext index, or am I best to just directly execute "CREATE FULLTEXT INDEX ..." as I did above?

  3. Looks like I have a SQL injection problem in my search/match against query. How can I do the select the "SQLAlchemy way" to fix this?

  4. Is there a clean way to join the users_myisam select/match against right back to my user table and return actual User instances, since this is what I really want?

  5. In order to keep my users_myisam table in sync with my mapped object user table, does it make sense for me to use a MapperExtension on my User class, and set the before_insert, before_update, and before_delete methods to update the users_myisam table appropriately, or is there some better way to accomplish this?

Thanks, Michael


Solution

  • Is my approach of creating two tables to solve my problem reasonable? Is there a standard/better/cleaner way to do this?

    I've not seen this use case attempted before, as developers who value transactions and constraints tend to use Postgresql in the first place. I understand that may not be possible in your specific scenario.

    Is there a SQLAlchemy way to create the fulltext index, or am I best to just directly execute "CREATE FULLTEXT INDEX ..." as I did above?

    conn.execute() is fine though if you want something slightly more integrated you can use the DDL() construct, read through http://docs.sqlalchemy.org/en/rel_0_8/core/schema.html?highlight=ddl#customizing-ddl for details

    Looks like I have a SQL injection problem in my search/match against query. How can I do the select the "SQLAlchemy way" to fix this?

    note: this recipe is only for MATCH against multiple columns simultaneously - if you have just one column, use the match() operator more simply.

    most basically you could use the text() construct:

    from sqlalchemy import text, bindparam
    
    users_myisam.select(
      text("MATCH (name,description) AGAINST (:value)", 
           bindparams=[bindparam('value', q)])
    )
    

    more comprehensively you could define a custom construct:

    from sqlalchemy.ext.compiler import compiles
    from sqlalchemy.sql.expression import ClauseElement
    from sqlalchemy import literal
    
    class Match(ClauseElement):
        def __init__(self, columns, value):
            self.columns = columns
            self.value = literal(value)
    
    @compiles(Match)
    def _match(element, compiler, **kw):
        return "MATCH (%s) AGAINST (%s)" % (
                   ", ".join(compiler.process(c, **kw) for c in element.columns),
                   compiler.process(element.value)
                 )
    
    my_table.select(Match([my_table.c.a, my_table.c.b], "some value"))
    

    docs:

    http://docs.sqlalchemy.org/en/rel_0_8/core/compiler.html

    Is there a clean way to join the users_myisam select/match against right back to my user table and return actual User instances, since this is what I really want?

    you should probably create a UserMyISAM class, map it just like User, then use relationship() to link the two classes together, then simple operations like this are possible:

    query(User).join(User.search_table).\
               filter(Match([UserSearch.x, UserSearch.y], "some value"))
    

    In order to keep my users_myisam table in sync with my mapped object user table, does it make sense for me to use a MapperExtension on my User class, and set the before_insert, before_update, and before_delete methods to update the users_myisam table appropriately, or is there some better way to accomplish this?

    MapperExtensions are deprecated, so you'd at least use the event API, and in most cases we want to try applying object mutations outside of the flush process. In this case, I'd be using the constructor for User, or alternatively the init event, as well as a basic @validates decorator which will receive values for the target attributes on User and copy those values into User.search_table.

    Overall, if you've been learning SQLAlchemy from another source (like the Oreilly book), its really out of date by many years, and I'd be focusing on the current online documentation.