I'm working on an OCR use case and have identified common misclassification from the confusion matrix which is for example: '1' being confused for 'J' and '2' being confused with 'Z' and 'J'.
For a given word, I am trying to create a python script which would create all the permutations which account for all the misclassification.
How do I go about solving this?
You get a neat solution by using a dictionary of all possible classifications, not just all mis-classifications. That is, you first "enrich" your misclassification dictionary with all possible correct classifications.
from itertools import product
all_characters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
common_misclass = {'1':['J'],'2':['Z','J']}
input_string = "AB1CD2"
common_class = {}
for char in all_characters:
if char in common_misclass:
common_class[char] = [char] + common_misclass[char]
common_class[char] = [char]
possible_outputs = ["".join(tup) for tup in
product(*[common_class[letter] for letter in input_string])]