Search code examples
paginationsqlalchemymany-to-manyleft-join

SQLAlchemy many-to-many LEFT OUTER JOIN with sorted list of entities


I've got two tables: resident and allergen and a many-to-many relationship between the two. Using SQLAlchemy over PostgreSQL, I'd like to get the list of residents ordered by their alphabetically ordered list of allergens:

 resident_id | resident_name |      allergs       
-------------+---------------+--------------------
           1 | John          | {milk,pollen,soy}
           3 | Hopkins       | {pollen,stupidity}
           2 | Mary          | {stupidity}
           4 | Lee           | {NULL}

I know this can be done in PostgreSQL using array_agg:

SELECT
  resident.id AS resident_id,
  resident.name AS resident_name,
  array_agg(allergen.NAME ORDER BY allergen.NAME) AS allergs
FROM resident
LEFT OUTER JOIN (resident_allergens AS resident_allergens_1
JOIN allergen
  ON allergen.id = resident_allergens_1.allergen_id)
  ON resident.id = resident_allergens_1.resident_id
GROUP BY resident.id
ORDER BY allergs

but as far as I can tell, SQLAlchemy does not offer support for ORDER BY clauses in the array_agg function.

So far, I've tried:

  • db.session.query(Resident).outerjoin(Resident.allergies).order_by(Allergen.name).from_self().group_by(Resident), but this does not sort the allergens properly
  • db.session.query(Resident, func.row_number().over(order_by=Allergen.name).label('rownum')).outerjoin(Resident.allergies).order_by(Allergen.name).from_self(Resident, func.min('rownum').label('maxrownum')).group_by(Resident).order_by('maxrownum').from_self(Resident), but func.min('rownum') returns u'rownum' instead of referencing the column
  • getting a sorted list of allergens, JOIN-ing the residents and then either GROUP BY resident or DISTINCT resident, but this ruins the order

Table definitions bellow:

resident_allergens = db.Table(
    'resident_allergens',
    db.Column('resident_id', db.Integer, db.ForeignKey('resident.id'), nullable=False, index=True),
    db.Column('allergen_id', db.Integer, db.ForeignKey('allergen.id'), nullable=False, index=True),
    UniqueConstraint('resident_id', 'allergen_id'))


class Allergen(db.Model):
    id = Column(db.Integer, primary_key=True)
    name = Column(db.String)

    def __init__(self, name):
        self.name = name


class Resident(db.Model):
    id = Column(db.Integer, primary_key=True)
    name = Column(db.String)

    allergies = db.relationship('Allergen',
                                collection_class=set,
                                secondary=resident_allergens,
                                backref=db.backref('residents', lazy='lazy'))

    def __init__(self, name):
        self.name = name

In case it's relevant, I'm using SQLAlchemy 0.8.5 over PostgreSQL 9.3.


Solution

  • Using SQLAlchemy's compilation extension, I was able to add my own version of array_agg, with support for ORDER BY:

    from sqlalchemy.ext.compiler import compiles
    from sqlalchemy.sql.expression import ColumnClause, _literal_as_binds
    
    
    class array_agg(ColumnClause):
        """Custom version of PostgreSQL's array_agg with support for ORDER BY.
    
        Usage: ... .order_by(array_agg(Allergen.name, order_by=Allergen.name))
        """
    
        def __init__(self, expr, order_by=None):
            self.expr = _literal_as_binds(expr)
            self.order_by = _literal_as_binds(order_by)
    
        @property
        def _from_objects(self):
            return self.expr._from_objects
    
    
    @compiles(array_agg)
    def compile_array_agg(element, compiler, **kwargs):
        head = 'array_agg(%s' % (
            compiler.process(element.expr),
        )
        if element.order_by is not None:
            tail = ' ORDER BY %s)' % compiler.process(element.order_by)
        else:
            tail = ')'
        return head + tail