Search code examples
pythonlistshallow-copycopy-assignment

shallow copy in python


I am a little confused on how shallow copy works, my understanding is when we do new_obj = copy.copy(mutable_obj) a new object is created with elements of it still pointing to the old object.

Example of where I am confused -

## assignment
i = [1, 2, 3]
j = i
id(i[0]) == id (j[0])  # True
i[0] = 10
i  # [10, 2, 3]
j  # [10, 2, 3]

## shallow copy
k = copy.copy(i)
k   # [10, 2, 3]
id(i) == id(k)  # False (as these are two separate objects)
id(i[0]) == id (k[0])  # True (as the reference the same location, right?)
i[0] = 100
id(i[0]) == id (k[0])  # False (why did that value in that loc change?)
id(i[:]) == id (k[:])  # True  (why is this still true if an element just changed?)
i   # [100, 2, 3]
k   # [10, 2, 3]

In shallow copy, isn't k[0] just pointing to i[0] similar to assignment? Shouldn't k[0] change when i[0] changes?

Why I expect these to be same, because -

i = [1, 2, [3]]
k = copy(i)
i  # [1, 2, [3]]
k  # [1, 2, [3]]
i[2].append(4)
i  # [1, 2, [3, 4]]
k  # [1, 2, [3, 4]]
id(i[0]) == id (k[0])  # True
id(i[2]) == id (k[2])  # True
id(i[:]) == id (k[:])  # True

Solution

  • id(i) == id(k) # False (as these are two separate objects)

    Correct.

    id(i[0]) == id (k[0]) # True (as the reference the same location, right?)

    Correct.

    i[0] = 100

    id(i[0]) == id (k[0]) # False (why did that value in that loc change?)

    It changed because you changed it in the previous line. i[0] was pointing 10, but you changed it to point to 100. Therefore, i[0] and k[0] now no longer point to the same spot.

    Pointers (references) are one way. 10 does not know what is pointing to it. Neither does 100. They are just locations in memory. So if you change where i's first element is pointing to, k doesn't care (since k and i are not the same reference). k's first element is still pointing to what it always was pointing to.

    id(i[:]) == id (k[:]) # True (why is this still true if an element just changed?)

    This one's a bit more subtle, but note that:

    >>> id([1,2,3,4,5]) == id([1,2,3])
    True
    

    whereas

    >>> x = [1,2,3,4,5]
    >>> y = [1,2,3]
    >>> id(x) == id(y)
    False
    

    It has to do with some subtleties of garbage collection and id, and it's answered in depth here: Unnamed Python objects have the same id.

    Long story short, when you say id([1,2,3,4,5]) == id([1,2,3]), the first thing that happens is we create [1,2,3,4,5]. Then we grab where it is in memory with the call to id. However, [1,2,3,4,5] is anonymous, and so the garbage collector immediately reclaims it. Then, we create another anonymous object, [1,2,3], and CPython happens to decide that it should go in the spot that it just cleaned up. [1,2,3] is also immediately deleted and cleaned up. If you store the references, though, GC can't get in the way, and then the references are different.

    Mutables example

    The same thing happens with mutable objects if you reassign them. Here's an example:

    >>> import copy
    >>> a = [ [1,2,3], [4,5,6], [7,8,9] ]
    >>> b = copy.copy(a)
    >>> a[0].append(123)
    >>> b[0]
    [1, 2, 3, 123]
    >>> a
    [[1, 2, 3, 123], [4, 5, 6], [7, 8, 9]]
    >>> b
    [[1, 2, 3, 123], [4, 5, 6], [7, 8, 9]]
    >>> a[0] = [123]
    >>> b[0]
    [1, 2, 3, 123]
    >>> a
    [[123], [4, 5, 6], [7, 8, 9]]
    >>> b
    [[1, 2, 3, 123], [4, 5, 6], [7, 8, 9]]
    

    The difference is when you say a[0].append(123), we're modifying whatever a[0] is pointing to. It happens to be the case that b[0] is pointing to the same object (a[0] and b[0] are references to the same object).

    But if you point a[0] to a new object (through assignment, as in a[0] = [123]), then b[0] and a[0] no longer point to the same place.