Arrange a Python object to discover what name it is assigned to

TL;DR summary

How can I write foo = MyClass() and have the MyClass object know that its name is foo?

Full Question

I want to write a library in Python that allows constructing a tree of objects, which support recursive traversal and naming. I would like to make this easy to use by automating the discovery of an object's name and parent object at the time it is created and added to the parent object. In this simple example, the name and parent object have to be explicitly passed into each object constructor:

class TreeNode:
    def __init__(self, name, parent):
        self.name = name
        self.children = []
        self.parent = parent
        if parent is None:
            self.fullname = name
        else:
            parent.register_child(self)

    def register_child(self, child):
        self.children.append(child)
        child.fullname = self.fullname + "." + child.name

    def recursive_print(self):
        print(self.fullname)
        for child in self.children:
            child.recursive_print()

class CustomNode(TreeNode):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.foo = TreeNode(name="foo", parent=self)
        self.bar = TreeNode(name="bar", parent=self)

root = TreeNode(name="root", parent=None)
root.a = CustomNode(name="a", parent=root)

root.recursive_print()

output:

root
root.a
root.a.foo
root.a.bar

What I would like to be able to do is omit the explicit name and parent arguments, something like:

class CustomNode(TreeNode):
    def __init__(self):
        self.foo = TreeNode()
        self.bar = TreeNode()

root = TreeNode(parent=None)
root.a = CustomNode()

I have a partial solution at the moment where I have TreeNode.__setattr__() check to see if it is assigning a new TreeNode and if so name it and register it; but one shortcoming is that TreeNode.__init__() cannot know its name or parent until after it returns, which would be preferable.

I am wondering if there is some neat way to do what I want, using metclasses or some other feature of the language.

Solution

Ordinarily, there is no way to do that. And in Python, a "name" is just a reference to the actual object - it could have several names pointing to it as in x = y = z = MyNode(), or no name at all - if the object is put inside a data structure like in mylist.append(MyNode()) .

So, keep in mind that even native structures from the language itself require one to repeat the name as string in cases like this - for example, when creating namedtuples (point = namedtuple("point", "x y")) or when creating classes programatically, by calling type as in MyClass = type("MyClass", (), {})

Of course, Python having the introspection capabilities it does, it would be possible for the constructor from TreeNode to retrieve the function from which it was called, by using sys._getframe(), then retrieve the text of the source code, and if it would be a simple well formed line like self.foo = TreeNode(), to extract the name foo from there manipulating the string. you should not do that, due to the considerations given above. (and the source code may not always be available to the running program, in which case this method would not work)

If you are always creating the nodes inside methods, the second most straighforward thing to do seems to be adding a short method to do it. The most straightforward is still typing the name twice in cases like this, just as you are doing.

class CustomNode(TreeNode):
    def __init__(self):
        self.add_node("foo")
        self.add_noded("bar")
        excrafurbate(self.foo)  # attribute can be used, as it is set in the method
    def add_node(self, name):
        setattr(self, name, TreeNode(name=name, parent=self))

There are some exceptions in the language for typing names twice, though. The one meant for this kind of things is that fixed attributes in a class can have a special method (__set_name__) through which they get to know their name. However, they are set per_class, and if you need separate instances of TreeNode in each instance of CustomNode, some other code have to be put in so that the new nodes are instantiated in a lazy way, or when the container class is instantiated.

In this case,it looks like it is possible to simply create a new TreeNode instance whenever the attribute is accessed in a new instance: the mechanism of __set_name__ is the descriptor protocol - the same used by Python property builtin. If new nodes are created empty by default it is easy to do - and you then control their attributes:


class ClsTreeNode:
    def __set_name__(self, owner, name):
        self.name = name
        
    def __get__(self, instance, owner):
        if instance is None:
            return self
        value = getattr(instance, "_" + self.name, None)
        if value is None:
            value = TreeNode(name = self.name, parent=instance)
            setattr(instance, "_" + self.name, value)
        return value
    
    def __set__(self, instance, value):
        # if setting a new node is not desired, just raise a ValueError
        if not isinstance(value, TreeNode):
            raise TypeError("...")
        # adjust name in parent inside parent node
        # oterwise create a new one.
        # ...or accept the node content, and create the new TreeNode here, 
        # one is free to do whatever wanted.
        value.name = self.name
        value.patent = instance
        setattr(instance, "_" + self.name, value)



class  CustomNode(TreeNode):
    foo = ClsTreeNode()
    bar = ClsTreeNode()

This, as stated, will only work if ClsTreeNode is a class attribute - check the descriptor protocol docs for more details.

The other way of not having to type the name twice would again fall in the "hackish, do not use", that would be: abuse the class statement. With a proper custom metaclass, the class foo(metaclass=TreeNodeMeta): pass statement does not need to create a new class, and can instead return any new object - and the call to the metaclass __new__ method will be passed the name of the class. It would still have to resort to inspecting the Frame object in the call stack to findout about its parent (while, as can be seen above, by using the descriptor protocol, one does have the parent object for free).