Search code examples
compressionencodevin

Assigning a distinct number to a string


Lets say that I have a VIN like this: SB164ABN10E082986.

Now, I want to assign an integer to each possible VIN (without WMI, which is the first three digits -> 64ABN10E082986) in a manner that I retrieve the VIN from this integer afterwards.

What would be the best way of doing so? It can be used to the advantage of such algorithm that the first 10 digits can be composed from those values:

1234567890 ABCDEFGH JKLMN P RSTUVWXYZ

and the last 4 can be composed of all one-digit numbers (0-9).

Background: I want to be able to save memory. So, in a sense I'm searching for a special way of compression. I calculated that an 8 Byte integer would suffice under those conditions. I am only missing the way of doing "the mapping".

This is how it should work:

VIN -> ALGORITHM -> INDEX
INDEX -> ALGORITHM REVERSED -> VIN 

Solution

  • Each character becomes a digit in a variable-base integer. Then convert those digits to the integer.

    Those that can be digits or one of 23 letters is base 33. Those that can only be digits are base 10. The total number of possible combinations is 3310 times 104. The logarithm base two of that is 63.73, so it will just fit in a 64-bit integer.

    You start with zero. Add the first digit. Multiply by the base of the next digit (33 or 10). Add that digit. Continue until all digits processed. You have the integer. Each digit is 0..32 or 0..9. Take care to properly convert the discontiguous letters to the contiguous numbers 0..32.

    Your string 64ABN10E082986 is then encoded as the integer 2836568518287652986. (I gave the digits the values 0..9, and the letters 10..32.)

    You can reverse the process by taking the integer and both dividing it by the last base and taking the modulo the last base. The result of the modulo is the last digit. Continue with the quotient from dividing for the next digit.

    By the way, in the US anyway the last five characters of the VIN must be numeric digits. I don't know why you are only considering four.