Search code examples
characterhidtypography

Set of unambiguous looking letters & numbers for user input


Is there an existing subset of the alphanumerics that is easier to read? In particular, is there a subset that has fewer characters that are visually ambiguous, and by removing (or equating) certain characters we reduce human error?

I know "visually ambiguous" is somewhat waffly of an expression, but it is fairly evident that D, O and 0 are all similar, and 1 and I are also similar. I would like to maximize the size of the set of alpha-numerics, but minimize the number of characters that are likely to be misinterpreted.

The only precedent I am aware of for such a set is the Canada Postal code system that removes the letters D, F, I, O, Q, and U, and that subset was created to aid the postal system's OCR process.

My initial thought is to use only capital letters and numbers as follows:

A
B = 8
C = G
D = 0 = O = Q
E = F
H
I = J = L = T = 1 = 7
K = X
M
N
P
R
S = 5
U = V = Y
W
Z = 2
3
4
6
9

This problem may be difficult to separate from the given type face. The distinctiveness of the characters in the chosen typeface could significantly affect the potential visual ambiguity of any two characters, but I expect that in most modern typefaces the above characters that are equated will have a similar enough appearance to warrant equating them.

I would be grateful for thoughts on the above – are the above equations suitable, or perhaps are there more characters that should be equated? Would lowercase characters be more suitable?


Solution

  • Mainly drawing inspiration from this ux thread, mentioned by @rwb,

    • Several programs use similar things. The list in your post seems to be very similar to those used in these programs, and I think it should be enough for most purposes. You can add always add redundancy (error-correction) to "forgive" minor mistakes; this will require you to space-out your codes (see Hamming distance), though.
    • No references as to particular method used in deriving the lists, except trial and error with humans (which is great for non-ocr: your users are humans)
    • It may make sense to use character grouping (say, groups of 5) to increase context ("first character in the second of 5 groups")
    • Ambiguity can be eliminated by using complete nouns (from a dictionary with few look-alikes; word-edit-distance may be useful here) instead of characters. People may confuse "1" with "i", but few will confuse "one" with "ice".
    • Another option is to make your code into a (fake) word that can be read out loud. A markov model may help you there.