Search code examples
machine-learningcomputer-visionartificial-intelligencenearest-neighbor

Extracting properties of handwritten digits to fasten nearest neighbour algorithm


I have 1024 bit long binary representation of three handwritten digits: 0, 1, 8. Basically, in 32x32 bitmap of a digit, rows are concatenated to form a binary vector.

There are 50 binary vectors for each digit. When we apply Nearest neighbour to each digit, we can use hamming distance metric or some other, and then apply the algorithm to differentiate between the vectors. Now I want to use another technique where instead of looking at each bit of a vector, I would like to analyse on less number of bits while comparing the vectors.

For example, I know that when one compares bitmap(size:1024 bits) of digits '8' and '0', We must have 1s in middle of the vector of digit '8' as there digit 8 visually appears as the combination of two zeros placed in column. So our algorithm would look for the intersection of two zeros(which would be the middle of digit.

Thats the way I want to work. I want to convert the low level representation(looking at 1024 bitmap vector) to the high level representation(that consist of two properties extracted from bitmap).

Any suggestion? I hope, the question is somewhat clear to the audience.


Solution

  • Idea 1: Flood fill

    This idea does not use the 50 patterns you have per digit: it is based on the idea that usually a "1" has all 0-bits connected around that "1" shape, while a "0" separates the 0-bits inside it from those outside it, and an "8" has two such enclosed areas. So counting connected areas of 0-bits would identify which of the three it is.

    So you could use a flood fill algorithm, starting at any 0 bit in the vector, and set all those connected 0-bits to 1. In a 1 dimensional array you need to take care to correctly identify connected bits (either horizontally: 1 position apart, but not crossing a 32 boundary, or vertically... 32 positions apart). Of course, this flood-filling will destroy the image - so make sure to use a copy. If after one such flood-fill there are still 0 bits (which were therefore not connected to those you turned into 1), then choose one of those and start a second flood-fill there. Repeat if necessary.

    When all bits have been set to 1 in that way, use the number of flood-fills you had to perform, as follows:

    1. One flood-fill? It's a "1", because all 0-bits are connected.
    2. Two flood-fills? It's a "0", because the shape of a zero separates two areas (inside/outside)
    3. Three flood-fills? It's an "8", because this shape separates three areas of connected 0-bits.

    Of course, this process assumes that these handwritten digits are well-formed. For example, if an 8-shape would have a small gap, like here:

    enter image description here

    ..then it will not be identified as an "8", but a "0". This particular problem could be resolved by identifying "loose ends" of 1-bits (a "line" that stops). When you have two of those at a short distance, then increase the number you got from flood-fill counting with 1 (as if those two ends were connected).

    Similarly, if a "0" accidentally has a small second loop, like here:

    enter image description here

    ...it will be identified as an "8" instead of a "0". You could prevent this particular problem by requiring that each flood-fill finds a minimum number of 0-bits (like at least 10 0-bits) to count as one.

    Idea 2: probability vector

    For each digit, add up the 50 example vectors you have, so that for each position you have a count somewhere between 0 to 50. You would have one such "probability" vector per digit, so prob0, prob1 and prob8. If prob8[501] = 45, it means that it is highly probable (45/50) that an "8" vector will have a 1-bit at index 501.

    Now transform these 3 probability vectors as follows: instead of storing a count per position, store the positions in order of decreasing count (probability). So if prob8[513] has the highest value (like 49), then that new array should start like [513, ...]. Let's call these new vectors A0, A8 and A1 (for the corresponding digit).

    Finally, when you need to match a given input vector, simultaneously go through A0, A1 and A8 (always looking at the same index in the three vectors) and keep 3 scores. When the input vector has a 1 at the position specified in A0[i], then add 1 to score0. If it also has a 1 at the position specified in A1[i] (same i), then add 1 to score1. Same thing for score8. Increment i, and repeat. Stop this iteration as soon as you have a clear winner, i.e. when the highest score among score0, score1 and score8 has crossed a threshold difference with the second highest score among them. At that point you know which digit is being represented.