Search code examples
pythonshallow-copy

keeping lists separate and avoiding shallow copy in python


Here's my pseudo-code:

class foo(bar):
    def __init__(self, aList):
        bar.__init__(self, aList)
        self.initialList = aList
    def clock(self):
        modify(self.workingList)
        if(something):
            self.workingList = self.initialList
        assert self.initialList == range(10)

class bar(object):
    def __init__(self, aList):
        self.workingList = aList

A normal operation of the program will be like this:

a = range(10)
b = foo(a)
for i in xrange(10000):
    b.clock()

This did not work because when I did

self.workingList = self.initialList

self.workingList pointed to the same object instead of copying it. So then when I do

modify(self.workingList)

it also modifies the self.initialList, which is meant to stay the same.

My solution to that was to substitute

self.workingList = self.initialList

with

self.workingList = [i for i in self.initialList]

and I even replaced

self.initialList = aList

with:

self.initialList = [j for j in aList]

Although I believe this should work in principle, it didn't fix the problem in practice. I verify this via the assert statement. It seems like I'm misunderstanding some pythonic semantic. Could somebody please explain?

Thanks!

EDIT: Notice that I understand the difference between deepcopy and shallowcopy. That's not what my question is about. I think something is getting messed up when I am using this in a class instantiation/inheritance. I am working offline to produce an MCV piece of code that I can provide here. Stay tuned for that update.

UPDATE:

I found the C in MCVE and here's the bug:

class bar(object):
    def __init__(self, aList):
        self.workingList = aList

class foo(bar):
    def __init__(self, aList):
        bar.__init__(self, aList)
        self.initialList = list(aList)
    def clock(self):
        self.workingList[2] = max(self.workingList[3], 2)
        #print self.workingList
        if(1):
            self.workingList = list(self.initialList)
        try:
                assert self.initialList == range(10)
        except:
                print self.initialList
                print self.workingList
                assert self.initialList == range(10)


a = range(10)
for i in xrange(10):
        b = foo(a)
        for j in xrange(100000):
                b.clock()

Solution

  • So after a lot of debugging I found the reason for this bug. One possible fix is as follows:

    class bar(object):
        def __init__(self, aList):
            self.workingList = list(aList) #aList
    

    The reason for this is that python passes object references by value. Here's a link that explains it pretty well.

    The way it pertains to my code is that every time I change self.workingList in foo.clock():

    modify(self.workingList)
    

    it actually modifies the object that aList points to (outside of the constructor!!), so when I create a new instance of foo(aList), I am passing a new wrong version of aList into it. This is super tricky!

    The trickiest thing is that reassigning self.workingList does not create this bug (although it would in pass-by-reference), but modifying self.workingList does, because python passes object-references by value (all of this is explained with details in the link)