Search code examples
c#c++compiler-constructionlow-level

How does the computer convert between types


So a common question you see on SO is how to convert between type x and type z but I want to know how does the computer do this?

For example, how does it take an int out of a string?

My theory is that a string is a char array at its core so its going index by index and checking it against the ascii table. If it falls within the range of ints then its added to the integer. Does it happen at an even lower level than this? Is there bitmasking taking place? How does this happen?

Disclaimer: not for school, just curious.


Solution

  • This question can only be answered when restricting the types to a somewhat managable subset. To do so, let us consider the three interesting types: Strings, integers and floats.

    The only other truly different basic type is a pointer, which is not usually converted in any meaningful manner (even the NULL check is not actually a conversion, but a special built in semantic for the 0 literal).

    int to float and vice versa

    Converting integers to floats and vice versa is simple, since modern CPUs provide an instruction to deal with that case directly.

    string to integer type

    Conversion from string to integer is fairly simple, because no numeric errors will happen. Indeed, any string is just a sequence of code points (which may or may not be represented by char or wchar_t), and the common method to work through this goes along the lines of the following:

    unsigned result = 0;
    for(size_t i = 0; i < str.size(); ++i) {
        unsigned c = str[i] - static_cast<unsigned>('0');
        if(c > '9') {
            if(i) return result; // ok: integer over
            else throw "no integer found";
        }
        if((MAX_SIZE_T - c) / 10 < result) throw "integer overflow";
        result = result * 10 + c;
    }
    

    If you wish to consider things like additional bases (e.g. strings like 0x123 as a hexadecimal representation) or negative values, it obivously requires a few more tests, but the basic algorithm stays the same.

    int to string

    As expected, this basically works in reverse: An implementation will always take the remainder of a division by 10 and then divide by 10. Since this will give the number in reverse, one can either print into a buffer from the back or reverse the result again.

    string to floating point type

    Parsing strings to a double (or float) is significantly more complex, since the conversion is supposed to happen with the highest possible accuracy. The basic idea here is to read the number as a string of digits while only remembering where the dot was and what the exponent is. Then, you would assemble the mantissa from this information (which basically is a 53 bit integer) and the exponent and assemble the actual bit pattern for the resulting number. This would then be copied into your target value.

    While this approach works perfectly fine, there are literally dozens of different approaches in use, all varying in performance, correctness and robustness.

    Actual implementations

    Note that actual implementations may have to do one more important (and horribly ugly) thing, which is locale. For example, in the German locale the "," is the decimal point and not the thousands seperator, so pi is roughly "3,1415926535".

    Perl string to double
    TCL string to double
    David M. Gay AT&T Paper string to double, double to string and source code
    Boost Spirit