I'm trying to use Facebook's SVoice to split out different speakers in my audio file using python. I found a library that implemented it here:
https://github.com/facebookresearch/svoice
However, I'm having trouble running it. The readme discusses how to train my own dataset which I can't really do since I don't have the noises parsed out in my own audio files. It also talks about how I can separate my own file using one of the models in the models folder but I get the following error when I try to follow the readme and create a model from the toy dataset:
File "/mnt/c/Users/imrea/PycharmProjects/svoice/svoice/data/audio.py", line 34, in find_audio_files
siginfo, _ = torchaudio.info(file)
TypeError: cannot unpack non-iterable AudioMetaData object
How do I run this to test the output on an audio file of my own? Has anyone used this before? Any guidance would be greatly appreciated!
You need to have torchaudio version 0.6.0 Try: pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 torchaudio==0.6.0 -f https://download.pytorch.org/whl/torch_stable.html This worked for me.