Search code examples
pythoninstance-variablesclass-variables

Python dataclasses.dataclass reference to variable instead of instance variable


In the following code, the default values in the definition of the Container class should produce new instance variables for c1.a and c2.a.

Instead, it looks like c1.a and c2.a are referencing the same variable.

Is @dataclass creating a class variable? That does not seem to be consistent with the intended functionality, and I cannot find anything about class variables in the documentation.

So, I think this is a bug. Can someone explain to me how to fix it? Should I report it as a bug on the python tracker?

I know this issue must be related to the way python passes objects by reference and built-in types by value since the b attribute (which is just a float) shows the expected/desired behavior while the a attribute (which is a user-defined object) is just a reference.

Thanks!

Code:

from dataclasses import dataclass

@dataclass
class VS:
    v: float  # value
    s: float  # scale factor
    
    def scaled_value(self):
        return self.v*self.s

@dataclass
class Container:
    a: VS = VS(1, 1)
    b: float = 1

c1 = Container()
c2 = Container()

print(c1)
print(c2)

c1.a.v = -999
c1.b = -999

print(c1)
print(c2)

Ouputs:

Container(a=VS(v=1, s=1), b=1)
Container(a=VS(v=1, s=1), b=1)
Container(a=VS(v=-999, s=1), b=-999)
Container(a=VS(v=-999, s=1), b=1)

Solution

  • In the OP's original example, a single VS object is created when the Container class is defined. That object is then shared across all instances of the Container class. This is a problem because user-defined classes such as VS result in a mutable objects. Thus, changing a in any Container object will change a in all other Container objects

    You want to generate a new VS object every time a Container class is instantiated at initialization time. Using the default_factory of the field function is a good way to go about this. Passing a lambda function allows all this to be done inline.

    I added a c member variable to Container with another VS class to illustrate that the members are independent when done this way.

    from dataclasses import dataclass, field
    
    @dataclass
    class VS:
        v: float  # value
        s: float  # scale factor
        
        def scaled_value(self):
            return self.v*self.s
    
    # Use a zero-argument lambda function for default_factory argument of field function.      
    @dataclass
    class Container:
        a: VS = field(default_factory=lambda:VS(1,1))
        b: float = 1
        c: VS = field(default_factory=lambda:VS(1,2))
    
    c1 = Container()
    c2 = Container()
    
    print(c1)
    print(c2)
    
    c1.a.v = -999
    c1.c.s = -999
    
    print(c1)
    print(c2)
    

    Output:

    Container(a=VS(v=1, s=1), b=1, c=VS(v=1, s=2))
    Container(a=VS(v=1, s=1), b=1, c=VS(v=1, s=2))
    Container(a=VS(v=-999, s=1), b=1, c=VS(v=1, s=-999))
    Container(a=VS(v=1, s=1), b=1, c=VS(v=1, s=2))