python deep-learning neural-network pytorch pickle

How do you implement SVoice?

I'm trying to use Facebook's SVoice to split out different speakers in my audio file using python. I found a library that implemented it here:

https://github.com/facebookresearch/svoice

However, I'm having trouble running it. The readme discusses how to train my own dataset which I can't really do since I don't have the noises parsed out in my own audio files. It also talks about how I can separate my own file using one of the models in the models folder but I get the following error when I try to follow the readme and create a model from the toy dataset:

File "/mnt/c/Users/imrea/PycharmProjects/svoice/svoice/data/audio.py", line 34, in find_audio_files
    siginfo, _ = torchaudio.info(file)
TypeError: cannot unpack non-iterable AudioMetaData object

How do I run this to test the output on an audio file of my own? Has anyone used this before? Any guidance would be greatly appreciated!

Solution

You need to have torchaudio version 0.6.0 Try: pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 torchaudio==0.6.0 -f https://download.pytorch.org/whl/torch_stable.html This worked for me.