Search code examples
algorithmencodingdecoding

How to encode and decode sequential integer from DB to a pseudorandom Base36 6 character long string


For a few days now I’m trying to find a solution for this problem but I haven’t had success yet.

I need to implement 2 functions: One for encoding integer to string and one for decoding string to the original integer. The encoded string needs to be 6 characters long, composed of uppercase letters and numbers (Base36) and needs to appear random. It doesn’t need to be highly secure, it just needs to appear random enough so someone who sees it couldn’t make a connection to the original integer at first sight. Because I’m trying to encode IDs, every string should be unique for each given integer (no collisions). I found and implemented a few algorithms that fulfill these requirements but there’s a problem: numbers that are next to each other are too similar. Basically they differ by one character. Even though it doesn’t have to be very secure, close numbers need to be as different as possible. In reality, this algorithm will encode integers up to a 100k at most so higher values will most likely never be used, but for the solution to be great it would need to cover values up to few milions.

Example of what I need: 1 -> A39MNY 2 -> HDY19X … 15 -> 2I8PZ9 etc.

I achieved this by using seeded Random objects, but the decoding part is the problem since I made the seed out of the inputted integer modified by a certain operations, so for the encoding to be achieved in one iteration, the original integer has to be known. Decoding is achieved by running encoding function until the number is found.

I’m willing to use some kind of encryption if these requirements are’t achievable by crafting some kind of simpler algorithm.

I’ve got really interested by this problem and I’m still actively digging for the solution. Any help that gets me a step closer to the solution is much appreciated.


Solution

  • Pick a large prime. Pick a random number in the range. Use Fermat's little theorem to scramble a large range.

    You can encode 36^6 = 2176782336 different strings. Looking for a prime p a bit smaller than that, the best we can do is 2176782317.

    So now we just need a large number n.

    import random
    print(random.randomint(1, 2176782317))
    

    This gave me 1173770130.

    Do a little arithmetic to find its inverse. For that we use the fact that if p is prime and a is relatively prime to p, then mod p, 1 = a^(p-1) = a * a^(p-2) and so the multiplicative inverse is a^(p-2).

    def n_pow_m_mod_k (n, m, k):
        answer = 1
        power = n
        while 0 < m:
            if 1 == m % 2:
                answer = (answer * power) % k
                m -= 1
            m = m // 2
            power = (power * power) % k
        return answer
    
    print(n_pow_m_mod_k(1173770130, 2176782317 - 2, 2176782317))
    

    That gives 6232931 as the multiplicative inverse. Check it.

    print((1173770130 * 6232931) % 2176782317)
    

    It gives 1, which is what we want.

    Now let's turn these magic numbers into working code.

    # Set up the character/number code.
    alphabet = '0123456789ABCDEFGHIJKLMONPQRSTUVWXYZ'
    
    chr_to_pos = {}
    for i in range(len(alphabet)):
        chr_to_pos[alphabet[i]] = i
    
    
    def id_to_code (n):
        m = (n * 1173770130) % 2176782317
        answer = ''
        for _ in range(6):
            answer += alphabet[m % 36]
            m = m // 36
        return answer
    
    def code_to_id (code):
        m = 0
        for i in range(len(code)):
            m += chr_to_pos[code[i]] * 36**i
        return (m * 6232931) % 2176782317
    
    
    for i in range(20):
        print(i, id_to_code(i), code_to_id(id_to_code(i)))
    

    And there you have it. Easy to reverse codes with a not very easy to see pattern and a range of a couple of billion.

    (DISCLAIMER: I took you at your word that you didn't need this to be secure. It is most definitely not secure.)