Search code examples
pythonmemorycopy

When does python know an initialization is a copy, so that change at one place automatically changes another place?


If I do:

x = 1
y = [x,x,x,x]
y[1] = 2
print y

I get:

[1,2,1,1]

BUT if I do:

x = [1,1]
y = [x,x,x,x]
y[1][0] = 2
print y

I get:

[[2,1],[2,1],[2,1],[2,1]]

Can someone explains to me the subtle difference between the two? I mean, something like how Python allocates memory so that in the first case, the four elements of y read different memory locations, but in the 2nd case, the four elements of y read the same location?

And why does Python behave like this? As I use Matlab, nothing like this will happen.

Thank you.


Solution

  • All variables in Python contain references (pointers). Even simple types, such as integers, which are stored directly within variables in other languages like C, are stored using references in Python. Assigning a value to a name changes what the name refers to (points to). Understanding exactly when this happens is key to understanding why Python behaves as it does.

    Let's begin:

    a = 2           # points a to the integer object 2
    a = 3           # points a to a different integer object, 3
    b = [1, 2, 3]   # points b to a new list object [1, 2, 3]
    

    Next:

    c = a            # a and c now point to the same integer object, 3
    d = b            # b and d now point to the same list object, [1, 2, 3]
    

    So far so good, right? Now you can see why this works the way it does:

    d.append(4)      # b and d still point to the same list object, which is
                     # now [1, 2, 3, 4]
    print(b)         # prints [1, 2, 3, 4] -- how could it not?
    

    In truth, everything works the same way regardless of the type of the object. It's just that some types you can't change "in place:" numbers, strings, and tuples among them:

    a += 2           # a now points to the integer object 5, because you can't
                     # change 2 into 5 (integers are immutable)
    print(c)         # prints 3. c still points to 3, because you never told
                     # Python to make c point anywhere else!
    

    But:

    b.append(5)      # doesn't change what b points to, just changes the list
    b += [6]         # also (somewhat counterintuitively) doesn't change what b
                     # points to, even though it did with integers
    print(d)         # prints [1, 2, 3, 4, 5, 6] because b and d still point to
                     # the same list
    

    The case of += is a little confusing since it behaves so differently with lists and integers. However, keep in mind that += (and most other Python operations) can be redefined by the objects themselves. In this case, += is processed by the method __iadd__() which is attached to the integer and list types. On integers, += returns a new object, because it has to, integers being immutable. On lists, it was deemed more efficient for += to return the same object it was passed, rather than making a copy.

    So to sum up:

    • Python variables (names) contain references (pointers) to objects
    • Assignment changes what object a variable (name) refers to (points to)
    • Some things that look like assignment (in particular augmented assignments like +=) aren't actually assignments, but method calls, and don't necessarily change what object a variable (name) refers to (points to), though they can