Trying to render a large and (possibly very) unpicklable object to a file for later use.
No complaints on the dill.dump(file)
side:
In [1]: import echonest.remix.audio as audio
In [2]: import dill
In [3]: audiofile = audio.LocalAudioFile("/Users/path/Track01.mp3")
en-ffmpeg -i "/Users/path/audio/Track01.mp3" -y -ac 2 -ar 44100 "/var/folders/X2/X2KGhecyG0aQhzRDohJqtU+++TI/-Tmp-/tmpWbonbH.wav"
Computed MD5 of file is b3820c166a014b7fb8abe15f42bbf26e
Probing for existing analysis
In [4]: with open('audio_object_dill.pkl', 'wb') as f:
...: dill.dump(audiofile, f)
...:
In [5]:
But trying to load the .pkl
file:
In [1]: import dill
In [2]: with open('audio_object_dill.pkl', 'rb') as f:
...: audio_object = dill.load(f)
...:
Returns following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-203b696a7d73> in <module>()
1 with open('audio_object_dill.pkl', 'rb') as f:
----> 2 audio_object = dill.load(f)
3
/Users/mikekilmer/Envs/GLITCH/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/dill.pyc in load(file)
185 pik = Unpickler(file)
186 pik._main_module = _main_module
--> 187 obj = pik.load()
188 if type(obj).__module__ == _main_module.__name__: # point obj class to main
189 try: obj.__class__ == getattr(pik._main_module, type(obj).__name__)
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.pyc in load(self)
856 while 1:
857 key = read(1)
--> 858 dispatch[key](self)
859 except _Stop, stopinst:
860 return stopinst.value
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.pyc in load_newobj(self)
1081 args = self.stack.pop()
1082 cls = self.stack[-1]
-> 1083 obj = cls.__new__(cls, *args)
1084 self.stack[-1] = obj
1085 dispatch[NEWOBJ] = load_newobj
TypeError: __new__() takes at least 2 arguments (1 given)
The AudioObject is much more complex (and large) than the class object
the above calls are made on (from SO answer), and I'm unclear as to whether I need to send a second argument via dill
, and if so, what that argument would be or how to tell if any approach to pickling is viable for this specific object.
Examining the object itself a bit:
In [4]: for k, v in vars(audiofile).items():
...: print k, v
...:
returns:
is_local False
defer False
numChannels 2
verbose True
endindex 13627008
analysis <echonest.remix.audio.AudioAnalysis object at 0x103c61bd0>
filename /Users/mikekilmer/Envs/GLITCH/glitcher/audio/Track01.mp3
convertedfile /var/folders/X2/X2KGhecyG0aQhzRDohJqtU+++TI/-Tmp-/tmp9ADD_Z.wav
sampleRate 44100
data [[0 0]
[0 0]
[0 0]
...,
[0 0]
[0 0]
[0 0]]
And audiofile.analysis
seems to contain an attribute called audiofile.analysis.source
which contains (or apparently points back to) audiofile.analysis.source.analysis
In this case, the answer lay within the module itself.
The LocalAudioFile
class provides (and each of it's instances can therefor utilize) it's own save
method, called via LocalAudioFile.save
or more likely the_audio_object_instance.save
.
In the case of an .mp3
file, the LocalAudioFile
instance consists of a pointer to a temporary .wav
file which is the decompressed version of the .mp3
, along with a whole bunch of analysis data which is returned from the initial audiofile, after it's been interfaced with the (internet-based) Echonest API
.
LocalAudioFile.save calls shutil.copyfile(path_to_wave, wav_path)
to save the .wav
file with same name and path as original file linked to audio object and returns an error if the file already exists. It calls pickle.dump(self, f)
to save the analysis data to a file also in the directory the initial audio object file was called from.
The LocalAudioFile
object can be reintroduced simply via pickle.load()
.
Here's an iPython
session in which I used the dill
, which is a very useful wrapper or interface that offers most of the standard pickle
methods plus a bunch more:
audiofile = audio.LocalAudioFile("/Users/mikekilmer/Envs/GLITCH/glitcher/audio/Track01.mp3")
In [1]: import echonest.remix.audio as audio
In [2]: import dill
# create the audio_file object
In [3]: audiofile = audio.LocalAudioFile("/Users/mikekilmer/Envs/GLITCH/glitcher/audio/Track01.mp3")
en-ffmpeg -i "/Users/path/audio/Track01.mp3" -y -ac 2 -ar 44100 "/var/folders/X2/X2KGhecyG0aQhzRDohJqtU+++TI/-Tmp-/tmp_3Ei0_.wav"
Computed MD5 of file is b3820c166a014b7fb8abe15f42bbf26e
Probing for existing analysis
#call the LocalAudioFile save method
In [4]: audiofile.save()
Saving analysis to local file /Users/path/audio/Track01.mp3.analysis.en
#confirm the object is valid by calling it's duration method
In [5]: audiofile.duration
Out[5]: 308.96
#delete the object - there's probably a "correct" way to do this
in [6]: audiofile = 0
#confirm it's no longer an audio_object
In [7]: audiofile.duration
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-12-04baaeda53a4> in <module>()
----> 1 audiofile2.duration
AttributeError: 'int' object has no attribute 'duration'
#open the pickled version (using dill)
In [8]: with open('/Users/path/audio/Track01.mp3.analysis.en') as f:
....: audiofile = dill.load(f)
....:
#confirm it's a valid LocalAudioFile object
In [8]: audiofile.duration
Out[8]: 308.96
Echonest is a very robust API and the remix package provides a ton of functionality. There's a small list of relevant links assembled here.