machine-learning artificial-intelligence

Creating a voice identification system using machine learning

As an educational project in machine learning, I was thinking of creating a voice identification system from scratch. It should be able to identify a speaker from his / her voice after being trained on his / her voice previously.

What approach should I take in tackling this challenge? Specifically, how would such a system work at a high level?

Solution

To use your machine learning algorithm, you must first define the features you are going to feed it.

The easiest thing to do would be to compute the Fourier Transform of the audio signal (with any FFT tool you want, it's pretty standard), and build a feature vector with the information on frequencies and their amplitude.

If it's not enough, you could use a spectrogram to add temporal informations.

Once the features are correctly set, you can start playing with your favorite classifier algorithm !!!

If you use python, I found this question explaining how to do the FFT part : FFT for Spectrograms in Python