This question is for a concept check. I have a string 000.00-010.0.0.0 that I'd like to find the closest match to from the list {000.00-012.0.0.0 and 000.00-008.0.0.0} (include with the edit measure a numerical distance measure) I'd like to take '012', '010' and '008' as tokens and measure the distance between these.
The standard approach to string match will look for a change in each char position, sum the changes and return a distance. A modified distance will also measure the ASCII distance between the CHARS - G is farther from E than D.
To measure that '012' is to '010' as '008' is, requires bundling three chars into a token. Can this token be easily measured for edit distance and distance? The problem seems more complicated by the removal of delimiters in the tree database.
My proposed solution I want a reality check on is to convert '012', '010', and '008' into single CHAR ASCII symbols, say ), *, and +, measure the char distance and string edit distance, then on print convert back into '012', '010', and '008'.
Sample string: MER99.C0.00M.14.006.00.060.350
And, there are wildcards:
MER99.*.006.00.060.350
MER99.C0.00M.??.006.00.060.350
Since the strings are the same length (some need dummy char for length, '00M' is actually 'M') matching is with the Hamming distance.
I do not need help with the match algorithm, the Hamming distance approach, wildcards, or the dummy char, I added this for context to the question. Right now, I treat the token as separate char and get good results, but know they are not as exact as could be if handled as a token. The limiting factor is probably the inconsistency within the coding schema. But, I'd like to have that as the limit and not my algorithm.
Your strings contains alpha-numerical characters, ie base 36 number. Furthermore, these characters are grouped in 'tokens'. It cannot be stored in a char
, but you can store it in an int
.
Instead of storing ints in your tree, you can store a pair, where the char tells the type of the value:
0
for a numeric value1
for *
2
for xxxx?
(mask)