Search code examples
audioneural-networkhistogramthresholdfuzzy-c-means

Audio segmentantion


What I am trying to do is to "separate" vowels from consonants from an audio file (wav file). For example, a file would be this sentence: "I am fine" and I have to separate the vowel sounds from the consonants one. After the "separation", I can ignore the consonants because they have no importance in this project. Also, I have to ignore the pauses in speech (the pauses between words). So this is my problem, how to separate the vowels from consonants.

I was advised that for segmentation I could use a fcm algorithm or the histogram method. I searched these 2 methods, however I could not find something that could help me.

Can someone walk me through the steps I have to do or give me some useful links? I want to mention I can also use some other methods (not necessarily fcm or histograms).

Thanks!


Solution

  • You can use hidden markov model (HMM) based segmentation methods to segment your speech signal into corresponding phonemes. You need correct transcription of the speech signal and letter-to-sound (LTS) rules to do this. Once you segment the speech correctly, you can then separate vowels easily. This link will be useful in this http://hts.sp.nitech.ac.jp/