TARGET
be a set of strings that I expect to be spoken.SOURCE
be the set of strings returned by a speech recognizer (that is, the possible sentences that it has heard).I need a way to choose a string from TARGET
. I read about the Levenshtein distance and the Damerau-Levenshtein distance, which basically returns the distance between a source string and a target string, that is the number of changes needed to transform the source string into the target string.
But, how can I apply this algorithm to a set of target strings?
I thought I'd use the following method:
TARGET
, I calculate the distance from each string in SOURCE
. In this way we obtain an m-by-n matrix, where n is the cardinality of SOURCE
and n is the cardinality of TARGET
. We could say that the i-th row represents the similarity of the sentences detected by the speech recognizer with respect to the i-th target.average_on_row(i)
, where i
is the row index.stadard_deviation_sum(i)
) refers to a string of TARGET
.The string which is associated with the shortest stadard_deviation_sum
could be the sentence pronounced by the user. Could be considered the correct method I used? Or are there other methods?
Obviously, too high values indicate that the sentence pronounced by the user probably does not belong to TARGET
.
You need to calculate these probabilities first: probability of insertion, deletion and substitution. Then use log of these probabilities as penalties for each operation.
In a "context independent" situation, if pi is probability of insertion, pd is probability of deletion and ps probability of substitution, the probability of observing the same symbol is pp=1-ps-pd.
In this case use log(pi/pp/k), log(pd/pp) and log(ps/pp/(k-1)) as penalties for insertion, deletion and substitution respectively, where k is the number of symbols in the system.
Essentially if you use this distance measure between a source and target you get log probability of observing that target given the source. If you have a bunch of training data (i.e. source-target pairs) choose some initial estimates for these probabilities, align source-target pairs and re-estimate these probabilities (AKA EM strategy).
You can start with one set of probabilities and assume context independence. Later you can assume some kind of clustering among the contexts (eg. assume there are k different sets of letters whose substitution rate is different...).