Search code examples

Python TypeError on Load Object using Dill

Trying to render a large and (possibly very) unpicklable object to a file for later use.

No complaints on the dill.dump(file) side:

In [1]: import as audio

In [2]: import dill

In [3]: audiofile = audio.LocalAudioFile("/Users/path/Track01.mp3")
en-ffmpeg -i "/Users/path/audio/Track01.mp3" -y -ac 2 -ar 44100 "/var/folders/X2/X2KGhecyG0aQhzRDohJqtU+++TI/-Tmp-/tmpWbonbH.wav"
Computed MD5 of file is b3820c166a014b7fb8abe15f42bbf26e
Probing for existing analysis

In [4]: with open('audio_object_dill.pkl', 'wb') as f:
   ...:     dill.dump(audiofile, f)

In [5]: 

But trying to load the .pkl file:

In [1]: import dill

In [2]: with open('audio_object_dill.pkl', 'rb') as f:
   ...:     audio_object = dill.load(f)

Returns following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-2-203b696a7d73> in <module>()
      1 with open('audio_object_dill.pkl', 'rb') as f:
----> 2     audio_object = dill.load(f)

/Users/mikekilmer/Envs/GLITCH/lib/python2.7/site-packages/ in load(file)
    185     pik = Unpickler(file)
    186     pik._main_module = _main_module
--> 187     obj = pik.load()
    188     if type(obj).__module__ == _main_module.__name__: # point obj class to main
    189         try: obj.__class__ == getattr(pik._main_module, type(obj).__name__)

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.pyc in load(self)
    856             while 1:
    857                 key = read(1)
--> 858                 dispatch[key](self)
    859         except _Stop, stopinst:
    860             return stopinst.value

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.pyc in load_newobj(self)
   1081         args = self.stack.pop()
   1082         cls = self.stack[-1]
-> 1083         obj = cls.__new__(cls, *args)
   1084         self.stack[-1] = obj
   1085     dispatch[NEWOBJ] = load_newobj

TypeError: __new__() takes at least 2 arguments (1 given)

The AudioObject is much more complex (and large) than the class object the above calls are made on (from SO answer), and I'm unclear as to whether I need to send a second argument via dill, and if so, what that argument would be or how to tell if any approach to pickling is viable for this specific object.

Examining the object itself a bit:

In [4]: for k, v in vars(audiofile).items():
...:     print k, v


is_local False
defer False
numChannels 2
verbose True
endindex 13627008
analysis < object at 0x103c61bd0>
filename /Users/mikekilmer/Envs/GLITCH/glitcher/audio/Track01.mp3
convertedfile /var/folders/X2/X2KGhecyG0aQhzRDohJqtU+++TI/-Tmp-/tmp9ADD_Z.wav
sampleRate 44100
data [[0 0]
 [0 0]
 [0 0]
 [0 0]
 [0 0]
 [0 0]]

And audiofile.analysis seems to contain an attribute called audiofile.analysis.source which contains (or apparently points back to) audiofile.analysis.source.analysis


  • In this case, the answer lay within the module itself.

    The LocalAudioFile class provides (and each of it's instances can therefor utilize) it's own save method, called via or more likely

    In the case of an .mp3 file, the LocalAudioFile instance consists of a pointer to a temporary .wav file which is the decompressed version of the .mp3, along with a whole bunch of analysis data which is returned from the initial audiofile, after it's been interfaced with the (internet-based) Echonest API. calls shutil.copyfile(path_to_wave, wav_path) to save the .wav file with same name and path as original file linked to audio object and returns an error if the file already exists. It calls pickle.dump(self, f) to save the analysis data to a file also in the directory the initial audio object file was called from.

    The LocalAudioFile object can be reintroduced simply via pickle.load().

    Here's an iPython session in which I used the dill, which is a very useful wrapper or interface that offers most of the standard pickle methods plus a bunch more:

    audiofile = audio.LocalAudioFile("/Users/mikekilmer/Envs/GLITCH/glitcher/audio/Track01.mp3")
    In [1]: import as audio
    In [2]: import dill
    # create the audio_file object
    In [3]: audiofile = audio.LocalAudioFile("/Users/mikekilmer/Envs/GLITCH/glitcher/audio/Track01.mp3")
    en-ffmpeg -i "/Users/path/audio/Track01.mp3" -y -ac 2 -ar 44100 "/var/folders/X2/X2KGhecyG0aQhzRDohJqtU+++TI/-Tmp-/tmp_3Ei0_.wav"
    Computed MD5 of file is b3820c166a014b7fb8abe15f42bbf26e
    Probing for existing analysis
    #call the LocalAudioFile save method
    In [4]:
    Saving analysis to local file /Users/path/audio/Track01.mp3.analysis.en
    #confirm the object is valid by calling it's duration method
    In [5]: audiofile.duration
    Out[5]: 308.96
    #delete the object - there's probably a "correct" way to do this
    in [6]: audiofile = 0
    #confirm it's no longer an audio_object
    In [7]: audiofile.duration
    AttributeError                            Traceback (most recent call last)
    <ipython-input-12-04baaeda53a4> in <module>()
    ----> 1 audiofile2.duration
    AttributeError: 'int' object has no attribute 'duration'
    #open the pickled version (using dill)
    In [8]: with open('/Users/path/audio/Track01.mp3.analysis.en') as f:
       ....:     audiofile = dill.load(f)
    #confirm it's a valid LocalAudioFile object
    In [8]: audiofile.duration
    Out[8]: 308.96

    Echonest is a very robust API and the remix package provides a ton of functionality. There's a small list of relevant links assembled here.