Search code examples
speechaudio-processing

Best method to speed up/slow down spoken english (NOT music) recording


I am looking for an algorithm to speed up English speech. Algorithms used for speeding up music generate many artifacts over doubled speed, and I am looking for something that works even at speeds of 3x or 4x with acceptable clarity.

Voice, intonations, pauses, all need to be preserved as much as possible, so a speech-to-text + text-to-speech method will not work.

The traditional vocoder methods seem to be not sufficient (obviously I do not know all of them). I am interested in some new procedural or machine learning-type method. I have hundreds of hours of lectures for each speakers with transcript, so training would not be a problem.

Use case: lecturers just speak at an impossible slow pace. E.g. I usually am listening recordings at 2x speed on Lynda, and those guys are not even very slow.


Solution

  • Sonic algorithm works pretty well for speech.