Search code examples
cmathradixrfcnumeral-system

Is there a standardized way to convert a string to an int in any base?


I am trying to create my own c function btoi(char *str, int base) that can take any base from 2 to 64. However after reading a bit I realise I might be opening a big can of worms.

I'm saying this because binary,octal,decimal,hexadecimal,base-32 and base-64 alphabets are either universal or well defined in rfc4648. However despite my initial assumption that anything up to base-62 would be a continuation of 0-9 + A-Z + a-z alphabet, reading section-7 of rfc4648 set me aback as "regular" base-32 is A-Z + 2-7.

To complicate things further we also have padding as a problem.

My question is: Is there a standardized way to convert a string to an int in any base (up to 64)?
Or is it however I want to implement it?


Solution

  • You're misunderstanding what RFC4648 is for.

    It's not dictating which characters should be used for a number in bases 16, 32, and 64. It's showing three different ways to encode binary data in ASCII text.

    In the case of base64, it takes 3 8-bit values, treats them as 4 6-bit values, then outputs ASCII characters. Below is an example of this from the RFC:

          Input data:  0x14fb9c03d97e
          Hex:     1   4    f   b    9   c     | 0   3    d   9    7   e
          8-bit:   00010100 11111011 10011100  | 00000011 11011001 01111110
          6-bit:   000101 001111 101110 011100 | 000000 111101 100101 111110
          Decimal: 5      15     46     28       0      61     37     62
          Output:  F      P      u      c        A      9      l      +
    

    The above shows how the bytes values 0x14 0xfb 0x9c 0x03 0xd9 0x7e are converted to the ASCII string FPucAgl+.

    As far as what alphabet is considered standard for numbers in bases 2-36, the most common is 0-9 for the values 0-9 and both a-z and A-Z for values 10-35 (i.e. case insensitive).

    The standard library function strtol already exists that will do this for you.