Search code examples
pythontreesqlalchemyhybridself-reference

SQLAlchemy: Recursive hybrid property in a tree node


I've the following self-referential (tree) node, and wish to filter/sort by the calculated properties uuid_path and name_path:

class Node (db.Model):
    id = db.Column (db.Integer, db.Sequence ('node_id_seq'), primary_key=True)

    ###########################################################################

    root_id = db.Column (db.Integer, db.ForeignKey (id, ondelete='CASCADE'),
        index=True)

    nodes = db.relationship ('Node',
        cascade='all, delete-orphan', lazy='dynamic',
        primaryjoin='Node.id==Node.root_id',
        backref=db.backref ('root', remote_side=id))

    ###########################################################################

    _uuid = db.Column (db.String (36), nullable=False, index=True, unique=True,
        name = 'uuid')
    _name = db.Column (db.Unicode (256), nullable=False, index=True,
        name = 'name')

    ###########################################################################

    @hybrid_property
    def uuid (self):
        return self._uuid

    @hybrid_property
    def name (self):
        return self._name
    @name.setter
    def name (self, value):
        self._name = value

    ###########################################################################

    def __init__ (self, name, root, mime=None, uuid=None):

        self.root = root
        self._uuid = uuid if uuid else str (uuid_random ())
        self._name = unicode (name) if name is not None else None

    def __repr__ (self):

        return u'<Node@%x: %s>' % (self.id if self.id else 0, self._name)

    ###########################################################################

    @hybrid_property
    def uuid_path (self):
        node, path = self, []
        while node:

            path.insert (0, node.uuid)
            node = node.root

        return os.path.sep.join (path)

    @hybrid_property
    def name_path (self):
        node, path = self, []
        while node:

            path.insert (0, node.name)
            node = node.root

        return os.path.sep.join (path)

    ###########################################################################

If I get a Node instance subnode and execute e.g. subnode.name_path then I get correctly e.g. root/subnode. But if I try to use Node.name_path (for filtering/sorting) then SQLAlchemy complains:

Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with Node.root has an attribute 'name'.

I'm pretty sure I've to introduce something like:

class Node (db.Model):

    @hybrid_property
    def name_path (self):
        node, path = self, []
        while node:

            path.insert (0, node.name)
            node = node.root

        return os.path.sep.join (path)

    @name_path.expression
    def name_path (cls):
        ## Recursive SQL expression??

But I struggle to get a correct definition for @name_path.expression (or @uuid_path.expression); it should somehow instruct SQL to deliver the path from the root node down to the node in question.

What I don't understand is why this is required, since I've told SQLAlchemy to iteratively calculate the path values. Thanks for your help.


Solution

  • All right after tweaking with PostgreSQL and SQLAlchemy, I think I've a solution: (1) First, I'd to write the query as a function in SQL, and (2) second provide the correct SQLAlchemy glue:

    The SQL part uses a WITH RECURSIVE CTE:

    CREATE OR REPLACE FUNCTION name_path(node)
      RETURNS text AS
    $BODY$
    
    WITH RECURSIVE graph (id, root_id, id_path, name_path) AS (
        SELECT n.id, n.root_id, ARRAY[n.id], ARRAY[n.name]
        FROM node n
    UNION
        SELECT n.id, n.root_id, id_path||ARRAY[n.id], name_path||ARRAY[n.name]
        FROM node n, graph g
        WHERE n.root_id = g.id)
    
    SELECT array_to_string (g.name_path, '/','.*')
    FROM graph g
    WHERE (g.id_path[1] = $1.base_id OR g.root_id IS NULL)
    AND (g.id = $1.id)
    
    $BODY$
      LANGUAGE sql STABLE
      COST 100;
    ALTER FUNCTION name_path(node)
      OWNER TO webed;
    

    and the SQLAlchemy side looks like this:

    class NamePathColumn (ColumnClause):
        pass
    
    @compiles (NamePathColumn)
    def compile_name_path_column (element, compiler, **kwargs):
        return 'node.name_path' ## something missing?
    

    and

    class Node (db.Model):
    
        def get_path (self, field):
    
            @cache.version (key=[self.uuid, 'path', field])
            def cached_path (self, field):
    
                if self.root:
                    return self.root.get_path (field) + [getattr (self, field)]
                else:
                    return [getattr (self, field)]
    
            if field == 'uuid':
                return cached_path (self, field)
            else:
                return cached_path.uncached (self, field)
    
        @hybrid_property
        def name_path (self):
            return os.path.sep.join (self.get_path (field='name'))
    
        @name_path.expression
        def name_path (cls):
            return NamePathColumn (cls)
    

    I avoid accessing Node.name_path on the DB if I'm on the pure Python side, but probably it would be faster if I would. The only thing I'm still not so sure about is in compile_name_path_column I do not consider any of the element, compiler, **kwargs parameters, which makes me a little suspicious.

    I just cooked this up, after tinkering for about 1.5 days with SA & PG, so it's very much possible that there is still room for improvement. I'd very much appreciate any remarks w.r.t. this approach. Thanks.