Search code examples
pythonlistdeep-copy

Appending instances of an object to a list only works with time consuming deepcopy, how can I change this?


I have a pymzml.run.Reader class from the pymzml package. This is a generator object, when looping through it it yields instances of the Spectrum class (also from the pymzml package). I'm comparing different instances with each other. Because pymzml.run.Reader is a generator object, after looping through them they can't be used anymore, so I save them in a list for comparison later on.

However, when I save them in a list and then loop through the list printing the id's of the spectra, it shows that it only save the last spectrum. To clarify:

import pymzml

def test(msrun):
    for spectrum in msrun:
        print spectrum['id']            
        spectrumList.append(spectrum)
    print '-'*20
    for i in spectrumList:
        print i['id']

msrun = pymzml.run.Reader(r'JG_Ti02-C1-1_C2-01A_file1.aligned.mzML')

gives:

1
2
3
4
5
--------------------
5
5 
5 
5
5

The pymzml has a deRef() function that makes a deepcopy of the spectrum, so the following does work correctly:

import pymzml

def test(msrun):
    for spectrum in msrun: 
        print spectrum['id']
        spectrumList.append(spectrum.deRef())

msrun = pymzml.run.Reader(r'JG_Ti02-C1-1_C2-01A_file1.aligned.mzML') 

However, making deepcopies is a major bottleneck which I'm trying to get out of my application. How can I append the spectrum instances to a list so that not only the last spectrum is appended multiple times?


Solution

  • It can't be just saving the last spectrum -- you're doing all the right things to save each object to the list.

    The problem is you're getting the same object over and over.

    Printing id(spectrum) in the loop to get its memory address will show that it is one object repeated with its id and other attributes changed.

    While you don't necessarily need copy.deepcopy(), you do need to make a copy. Try copy.copy(), and look at the source of Spectrum.decRef() to see how it does its copying.

    Most likely, you do need to decRef() each one to make them independent -- otherwise, why would the class provide a special method?