Search code examples
audiosignalssignal-processingspeech

Segment voice and unvoiced speech?


I'd like to know how can I do a phonetic segmention of an audio file. E.g. Father, I think it'd be, F-a-th-er.

I tought about using Zero Crossing to detect the Voiced and Unvoiced Region but I'm know sure about it.

Thank you.


Solution

  • Zero-crossing you mentioned is one way to go, as explained e.g. in this article. Other include neural networks or Hidden Markov Models.

    To get any decent results, you should also have a language model. It's much easier to work with sentences / words and only then translate these into phonemes. Why? Because context is essential for computer systems - and often even us, humans - to understand the word. Context provides constraints for phonemes and it's hard (impossible?) to work without it.