Search code examples
pythondeep-learningneural-networkpytorchpickle

How do you implement SVoice?


I'm trying to use Facebook's SVoice to split out different speakers in my audio file using python. I found a library that implemented it here:

https://github.com/facebookresearch/svoice

However, I'm having trouble running it. The readme discusses how to train my own dataset which I can't really do since I don't have the noises parsed out in my own audio files. It also talks about how I can separate my own file using one of the models in the models folder but I get the following error when I try to follow the readme and create a model from the toy dataset:

File "/mnt/c/Users/imrea/PycharmProjects/svoice/svoice/data/audio.py", line 34, in find_audio_files
    siginfo, _ = torchaudio.info(file)
TypeError: cannot unpack non-iterable AudioMetaData object

How do I run this to test the output on an audio file of my own? Has anyone used this before? Any guidance would be greatly appreciated!


Solution

  • You need to have torchaudio version 0.6.0 Try: pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 torchaudio==0.6.0 -f https://download.pytorch.org/whl/torch_stable.html This worked for me.