audio neural-network histogram threshold fuzzy-c-means

Audio segmentantion

What I am trying to do is to "separate" vowels from consonants from an audio file (wav file). For example, a file would be this sentence: "I am fine" and I have to separate the vowel sounds from the consonants one. After the "separation", I can ignore the consonants because they have no importance in this project. Also, I have to ignore the pauses in speech (the pauses between words). So this is my problem, how to separate the vowels from consonants.

I was advised that for segmentation I could use a fcm algorithm or the histogram method. I searched these 2 methods, however I could not find something that could help me.

Can someone walk me through the steps I have to do or give me some useful links? I want to mention I can also use some other methods (not necessarily fcm or histograms).

Thanks!

Solution

You can use hidden markov model (HMM) based segmentation methods to segment your speech signal into corresponding phonemes. You need correct transcription of the speech signal and letter-to-sound (LTS) rules to do this. Once you segment the speech correctly, you can then separate vowels easily. This link will be useful in this http://hts.sp.nitech.ac.jp/