Search code examples
speech-recognitioncmusphinxsphinx4

Decide cluster size for speaker adaptation in Sphinx-4


In CMU Sphinx(Sphinx-4) for speaker adaptation technique, I am using following code snippet

Stats stats = recognizer.createStats(nrOfClusters);
recognizer.startRecognition(stream);
while ((result = recognizer.getResult()) != null) {
    stats.collect(result);
}
recognizer.stopRecognition();

// Transform represents the speech profile
Transform transform = stats.createTransform();
recognizer.setTransform(transform);

what should be nrOfClusters(number of clusters) parameter value to get good results? How can we use this snippet to adapt to multiple speakers in audio?


Solution

  • What should be nrOfClusters(number of clusters) parameter value to get good results?

    Number of clusters depend on amount of data for adaptation. The more data you have, the more clusters you can use. For example, if you have 30 seconds of speech, 1 cluster is enough. If you have 10 minutes of speech you can use up to 32 clusters.

    How can we use this snippet to adapt to multiple speakers in audio?

    If you know times for each speaker you can run adaptation for each speaker separately. There is no much sense to create a shared transform for different speakers.