Search code examples
cspeech-recognitionmfccsom

Speech recognition using kohonen network with MFCC features. How I set a distance between the neurons and their weights?


I don't know how to set a localization of each neuron in map. This is a neuron and map:

typedef struct _neuron
{
    mfcc_frame *frames;
    char *name;
    double *weights;
    int num_weights;
    int x;
    int y;
} neuron;
typedef struct _map
{
neuron *lattice;
    int latice_size;
    double mapRadius;
    int sideX, sideY; 
    int scale;
} map;

If i have more of one word equal, how calculate a distance between the pattern input (word) and my neuron.

I not sure about the weights. I define the weights as the amount of mfcc features of a word, but in training I need to update this weight according to the distance between the neurons. I'm using the Euclidean distance between the neurons. But the doubt is how to update the weights. Here the code of init map and neurons

void init_neuron(neuron *n, int x, int y, mfcc_frame *mfcc_frames, unsigned int n_frames, char *name){

double r;
register int i, j;
n->frames = mfcc_frames;
n->num_weights = n_frames;
n->x = x; 
n->y = y;

n->name = malloc (strlen(name) * sizeof(char));
strcpy(n->name, name);
n->weights= malloc (n_frames * sizeof (double));

for(i = 0; i < n_frames; i++)
    for(j = 0; j < N_MFCC; j++)
        n->weights[i] = mfcc_frames[i].features[j];

printf("%s lattice %d, %d\n", n->name, n->x, n->y);

}

init map:

map* init_map(int sideX, int sideY, int scale){
register int i, x, y;
char *name = NULL;
void **word_adresses;
unsigned int n = 0, count = 0;
int aux = 0;
word *words = malloc(sizeof(word));

map *_map = malloc(sizeof(map));
_map->latice_size = sideX * sideY;
_map->sideX       = sideX;
_map->sideY       = sideY; 
_map->scale       = scale;
_map->lattice     = malloc(_map->latice_size * sizeof(neuron));
mt_seed ();

if ((n = get_list(words))){
    word_adresses = malloc(n * sizeof(void *));
    while (words != NULL){
        x = mt_rand() %sideX;
        y = mt_rand() %sideY;
        printf("y : %d  x: %d\n", y, x);
        init_neuron(_map->lattice + y * sideX + x, x, y, words->frames, words->n, words->name);

        word_adresses[count++] = words;     
        words = words->next;
    }
    for (i = 0; i < count; i++)
        free(word_adresses[i]);
    free(word_adresses);
    aux++;
}

return _map;

}


Solution

  • In the Kohonen SOM, the weights are in the feature space, so that means that each neuron contains one prototype vector. If the input is 12 MFCCs, then each input might look like a vector of 12 double values, so that means each neuron has 12 values, one for each of the MFCCs. Given an input, you find the best matching unit, then move the 12 codebook values for that neuron towards the input vector a small amount that is based on the learning rate.