Search code examples
pythonclassattributes

Python - Data Attributes vs Class Attributes and Instance Attributes - When to use Data Attributes?


I am learning Python and have started a chapter on "classes" and also class/instance attributes. The chapter starts off with a very basic example of creating an empty class

class Contact:
   pass

x=Contact()

So an empty class is created and an instance of the class is created. Then it also throws in the following line of code

x.name='Mr.Roger'

So this threw me for a loop as the class definition is totally empty with no variables. Similarly the object is created with no variables.

Its explained that apparently this is a "data attribute". I tried to google this and most documentation speaks to class/instance attributes - Though I was able to find reference to data attributes here: https://docs.python.org/3/tutorial/classes.html#instance-objects

In my very basic mind - What I am seeing happening is that an empty object is instantiated. Then seemingly new variables can then be created and attached to this object (in this case x.name). I am assuming that we can create any number of attributes in this manner so we could even do

x.firstname='Roger'
x.middlename='Sam'
x.lastname='Jacobs'

etc.

Since there are already class and instance attributes - I am confused why one would do this and for what situations or use-cases? Is this not a recommended way of creating attributes or is this frowned upon?

If I create a second object and then attach other attributes to it - How can I find all the attributes attached to this object or any other object that is implemented in a similar way?


Solution

  • Python is a very dynamic language. Classes acts like molds, they can create instance according to a specific shape, but unlike other languages where shapes are fixed, in Python you can (nearly) always modify their shape.

    I never heard of "data attribute" in this context, so I'm not surprised that you did find nothing to explain this behavior.
    Instead, I recommend you the Python data model documentation. Under "Class instances" :

    [...] A class instance has a namespace implemented as a dictionary which is the first place in which attribute references are searched. When an attribute is not found there, and the instance’s class has an attribute by that name, the search continues with the class attributes.
    [...]
    Special attributes: __dict__ is the attribute dictionary; __class__ is the instance’s class.

    Python looks simple on the surface level, but what happens when you do a.my_value is rather complex. For the simple cases, my_value is an instance variable, which usually is defined during the class declaration, like so :

    class Something:
        def __init__(self, parameter):
            self.my_value = parameter  # storing the parameter in an instance variable (self)
    
    a = Something(1)
    b = Something(2)
    
    # instance variables are not shared (by default)
    print(a.my_value)  # 1
    print(b.my_value)  # 2
    a.my_value = 10
    b.my_value = 20
    print(a.my_value)  # 10
    print(b.my_value)  # 20
    

    But it would have worked without the __init__:

    class Something:
        pass  # nothing special
    
    a = Something()
    a.my_value = 1  # we have to set it ourselves, because there is no more __init__
    b = Something()
    b.my_value = 2  # same
    
    # and we get the same results as before :
    print(a.my_value)  # 1
    print(b.my_value)  # 2
    a.my_value = 10
    b.my_value = 20
    print(a.my_value)  # 10
    print(b.my_value)  # 20
    

    Because each instance uses a dictionary to store its attributes (methods and fields), and you can edit this dictionary, then you can edit the fields of any object at any moment. This is both very handy sometimes, and very annoying other times.

    Example of the instance's __dict__ attribute :

    class Something:
        pass  # nothing special
    
    a = Something()
    print(a.__dict__)  # {}
    a.my_value = 1
    print(a.__dict__)  # {'my_value': 1}
    a.my_value = 10
    print(a.__dict__)  # {'my_value': 10}
    

    Because it did not existed before, it got added to the __dict__. Then it just got modified.

    And if we create another Something:

    b = Something()
    print(a.__dict__)  # {'my_value': 10}
    print(b.__dict__)  # {}
    

    They were created with the same mold (the Something class) but one got modified afterwards.

    The usual way to set attributes to instances is with the __init__ method :

    class Something:
        def __init__(self, param):
            print(self.__dict__)  # {}
            self.my_value = param
            print(self.__dict__)  # {'my_value': 1}
    
    a = Something(1)
    print(a.__dict__)  # {'my_value': 1}
    

    It does exactly what we did before : add a new entry in the instance's __dict__. In that way, __init__ is not much more than a convention of where to put all your fields declarations, but you can do without.

    It comes from the face that everything in Python is a dynamic object, that you can edit anytime. For example, that's the way modules work too :

    import sys
    this_module = sys.modules[__name__]
    
    print(this_module.__dict__)  # {... a bunch of things ...}
    
    MODULE_VAR = 4
    
    print(this_module.__dict__)  # {... a bunch of things ..., 'MODULE_VAR': 4}
    

    This is a core feature of Python, its dynamic nature sometime makes things easy. For example, it enables duck typing, monkey patching, instrospection, ... But in a large codebases, without coding rules, you can quickly get a mess of undeclared instances everywhere. Nowadays, we try to write less clever, more reliable code, so adding new attributes to instances outside of the __init__ is indeed frowned upon.