Search code examples
c++cutf-8latin1string-conversion

How to convert a String from UTF8 to Latin1 in C/C++?


The question I have is quite simple, but I couldn't find a solution so far:

How can I convert a UTF8 encoded string to a latin1 encoded string in C++ without using any extra libs like libiconv?

Every example I could find so far is for latin1 to UTF8 conversion?


Solution

  • typedef unsigned value_type;
    
    template <typename Iterator>
    size_t get_length (Iterator p)
    {
        unsigned char c = static_cast<unsigned char> (*p);
        if (c < 0x80) return 1;
        else if (!(c & 0x20)) return 2;
        else if (!(c & 0x10)) return 3;
        else if (!(c & 0x08)) return 4;
        else if (!(c & 0x04)) return 5;
        else return 6;
    }
    
    template <typename Iterator>
    value_type get_value (Iterator p)
    {
        size_t len = get_length (p);
    
        if (len == 1)
        return *p;
    
        value_type res = static_cast<unsigned char> (
                                        *p & (0xff >> (len + 1)))
                                         << ((len - 1) * 6);
    
        for (--len; len; --len)
            res |= (static_cast<unsigned char> (*(++p)) - 0x80) << ((len - 1) * 6);
    
        return res;
    }
    

    This function will return the unicode code point at p. You can now convert a string using

    for (std::string::iterator p = s_utf8.begin(); p != s_utf8.end(); ++p)
    {
         value_type value = get_value<std::string::iterator&>(p));
         if (value > 0xff)
             throw "AAAAAH!";
         s_latin1.append(static_cast<char>(value));
    }
    

    No guarantees, the code is quite old :)