Search code examples
sqlalchemy

Supplement SQLAlchemy Results With Python Functions


Is there any way to use python functions in a(n) SQLAlchemy select()? I want to be able to do something like this:

select(Table.id, python_func(Table.type))

The expectation would be that the library issues a SELECT id, type FROM table statement to fetch the data. When the data is being converted to Rows, it would call the python function with the type and use the result of that function for one of the column values.

[(1, 'output_for_1'), (2, 'output_for_2'), ...]

I considered using hybrid_property. This would work fine for accessing the data for an object that is already loaded into memory, but fails when including the property in a SELECT clause. The logic in python does not have a(n) SQL equivalent, so it errors trying to generate the SQL statement. This approach might work if I could tell SQLAlchemy to just do nothing in the SQL, but I haven't found a way to do that yet.

    @hybrid_property
    def type_message(self) -> str:
        return get_lookup_detail('TYPE', self.type)

    @type_message.inplace.expression
    @classmethod
    def _type_message_exp(cls) -> ColumnElement:
        return ColumnElement()  # any way to return an "empty" clause so no SQL is rendered?

I tried setting the expression to null().label('type_message') so that the SQL would just select NULL, but then the type_message property is shown as None at runtime and the function call is not executed.


Solution

  • I've spent more time on this and came up with two options. The first one I did was to just perform post-processing on the SQL results.

    def _transformer(row: dict):
        row['type'] = get_lookup_detail('TYPE', row['type'])
        return row
    
    results = [_transformer(row._asdict()) for row in session.execute(select(Table.id, Table.type)).all()]
    

    This worked well enough for my needs, but I didn't really love the idea of adding an extra round of iteration through the results and needing to convert all of the rows to dicts. While I was in the SQLAlchemy documentation for other reasons, I came across TypeDecorator and the type_coerce() function. This gave me another idea that is more like what I was originally looking for.

    class LookupDetail(TypeDecorator):
        impl = String
        cache_ok = True
    
        def process_result_value(self, value, dialect):
            return get_lookup_detail('TYPE', value)
    
    stmt = select(Table.id,
                  type_coerce(Table.type, LookupDetail()))
    session.execute(stmt).all()
    

    My (very unscientific) testing suggests that the performance of these two approaches is very similar on small result sets (on the scale of dozens of rows). Unfortunately, I don't have a good example to test a larger volume on right now.

    At least for the smaller result sets that I tested, it seems that choosing between them is more of a personal preference. The first option is maybe a little simpler code, but is not necessarily as easy to reuse. You may need to create new transformer functions for every query. There's also a risk that this does not scale well to large result sets.

    The second option is maybe a little less intuitive at first glance, but it is taking advantage of core features of the SQLAlchemy library. It doesn't require additional iterations since the function is called while the results are being processed by SQLAlchemy, so the risk of performance degradation while scaling is less.

    One final difference between the two approaches is the ability to execute functions that require multiple columns. For example, say you wanted to have a field that indicated if the person was above or below the average salary for their position. To calculate this you probably need to call a function that needs the person's position and their salary. With the first approach this is really easy since you have access to the whole row as a dict object. For the second approach, you'd need to do something to combine the two columns in SQL, then parse out the separate values in process_result_value() before calling the function.