Search code examples
machine-learningpattern-matchingspeech-recognitionspeaker

Binary Classification by using Gaussian Mixture Model


I want to implement the T=Log( f ( x | client) ) - Log( f ( x | impostor) ) for decision boundary. My features for training and testing are 20*12. I have applied the voicebox matlab tool box. I write the following MATLAB code :

if max(lp_client)- max(lp_impostor) >0.35
   disp('accept');
else
   disp('reject');
end

Should I used mean of Log probability or max of Log probability ?


Solution

  • You should use sum of lp_client because of the probability nature of the estimate. If you have a sequence of independent events (feature independence is often assumed in this model), it's probability is a product of probabilies of the each event:

    P (Seq | X ) = P(feat1 | x) * P(feat2 | X) ...

    Or in log domain

    logP (Seq | X) = logP (feat1 | x) + logP(feat2 | X)

    So actually

    logP ( x | client) = sum (lp_client)

    and

    logP(x | impostor) = sum (lp_impostor)