Search code examples
rubystringbase62

string to integer conversion omitting first character of charset


This is more of a general problem than a ruby specific one, I just happen to be doing it in ruby. I am trying to convert a string into an Integer/Long/Bigint, or whatever you want to call it, using a charset for example Base62 (0-9a-zA-Z).

Problem is when I try to convert a string like "0ab" into an integer and that integer back to a string I get "ab". This occurs with any string starting the first character of the alphabet.

Here is an example implementation, that has the same issue.

https://github.com/jtzemp/base62/blob/master/lib/base62.rb

In action:

2.1.3 :001 > require 'base62'
 => true
2.1.3 :002 > Base62.decode "0ab"
 => 2269 
2.1.3 :003 > Base62.encode 2269
 => "ab"

I might be missing the obvious.

How can I convert bidirectionally without that exception?


Solution

  • You're correct that this is a more general problem.

    One solution is to use "padding", which fills in extra information such as indicating missing bits, or a conversion that isn't quite perfectly clean.

    In your particular code for example, you are currently losing the leading character if it's the first primitive. This is because the leading character has a zero index, and you're adding the zero to your int, which doesn't change anything.

    In your code, the padding could be accomplished a variety of ways.

    For example, prepending a given leading character that is not the first primitive.

    Essentially, you need to choose a way to protect the zero value, so it is not lost by the int.

    An alternate solution is to change your storage from using an int to using any other object that doesn't lose leading zeros, such as a string. This is how a typical Base64 encoding class does it: the input is a string, and the storage is also a string.