The following code was supposed to clarify how Python class variables behave,
but somehow it opens more questions than it solves.
The class Bodyguard
has the variable protect
, which is a list that by default contains the king.
The classes AnnoyingBodyguard
and Bureaucrat
change it.
Guards that protect specific people shall be called specific. (bg_prime
, bg_foreign
, ...)
The others shall be called generic. (bg1
, bg2
, bg3
)
For specific guards the changes affect only those initialized after the change.
For generic guards the changes affect all of them, no matter when they were initialized.
Why the before/after difference for specific guards? Why the specific/generic difference?
These differences are somewhat surprising, but I find the following even stranger.
Given two lists a
and b
, one might think that these operations will always have the same result:
reassign: a = a + b
add-assign: a += b
append: for x in b: a.append(x)
Why do they cause completely different results when used in Bodyguard.__init__
?
Only the results using reassign make any sense.
They can be seen below and in reassign_good.py.
The results for add-assign and append are quite useless, and I do not show them here.
But they can be seen in addassign_bad.py and append_bad.py.
class Bodyguard:
protect = ['the king']
def __init__(self, *args):
if args:
self.protect = self.protect + list(args)
##################################################################################
bg1 = Bodyguard()
bg_prime = Bodyguard('the prime minister')
bg_foobar = Bodyguard('the secretary of foo', 'the secretary of bar')
assert bg1.protect == ['the king']
assert bg_prime.protect == ['the king', 'the prime minister']
assert bg_foobar.protect == [
'the king', 'the secretary of foo', 'the secretary of bar'
]
##################################################################################
class AnnoyingBodyguard(Bodyguard):
Bodyguard.protect = ['his majesty the king']
bg2 = Bodyguard()
bg_foreign = Bodyguard('the foreign minister')
# The king's title was updated for all generic guards.
assert bg1.protect == bg2.protect == ['his majesty the king']
# And for specific guards initialized after AnnoyingBodyguard was defined.
assert bg_foreign.protect == ['his majesty the king', 'the foreign minister']
# But not for specific guards initialized before AnnoyingBodyguard was defined.
assert bg_prime.protect == ['the king', 'the prime minister']
assert bg_foobar.protect == [
'the king', 'the secretary of foo', 'the secretary of bar'
]
##################################################################################
class Bureaucrat:
def __init__(self, name):
Bodyguard.protect.append(name)
malfoy = Bureaucrat('Malfoy')
bg3 = Bodyguard()
bg_paper = Bodyguard('the secretary of paperwork')
# Malfoy was added for all generic guards.
assert bg1.protect == bg2.protect == bg3.protect == [
'his majesty the king', 'Malfoy'
]
# And for specific guards initialized after Malfoy:
assert bg_paper.protect == [
'his majesty the king', 'Malfoy', 'the secretary of paperwork'
]
# But not for specific guards initialized before Malfoy:
assert bg_prime.protect == ['the king', 'the prime minister']
assert bg_foreign.protect == [
'his majesty the king', 'the foreign minister'
]
Edit: Based on the comments and answers, I added the script reassign_better.py,
where the differences between generic and specific guards are removed.
The main class should look like this:
class Bodyguard:
protect = ['the king']
def __init__(self, *args):
self.protect = self.protect[:] # force reassign also for generic guards
if args:
self.protect = self.protect + list(args)
Perhaps examples will clarify this. This is, in my view, the KEY point to understanding Python behind the scenes.
a = [1,2,3]
b = a
c = a
At this point, our program has exactly ONE list object. There happen to be three names bound to that one list. Modifying any of them modifies the one list, and will be visible everywhere:
b.append(4)
print(c)
Prints [1, 2, 3, 4]
. However, if we do:
b = b + [5]
print(a)
print(b)
That creates a BRAND NEW list object and binds it to the name b
. a
and c
are still bound to the original, so that prints
[1, 2, 3, 4]
[1, 2, 3, 4, 5]
The way I like to think about this is that there are two different "spaces" in Python: there is an object space, filled with thousands of anonymous objects that do not have a name, and there is a namespace, which contains names that are bound to objects. It's important to recognize this. Names do not have values. They are merely bound to objects. And this includes EVERY name: variables, functions, classes, modules, etc.
Note that this confusion does not actually require separate names. Take, for example, the very common error:
a = [[0] * 10] * 10
Many would think this creates 10 different lists. That's not so. This creates exactly TWO lists: one that contains 10 zeros, and one that contains 10 references to that list. So if you do:
a[5][5] = 7
that change is seen in all ten elements of a
.