Search code examples
c++utf-8special-charactersdecodeicu

Convertion of special characters to unicode c++


Currently, I have this character ° (a degree symbol), that I need to convert it to /00B0. I noticed that there is a library called ICU for C/C++, but will I need to use such library? My input is encoded as ISO/IEC 8859-1.

Does the general C++ libraries have this DECODE function already implemented or is the ICU library needed for such operations?

If there is such a method to call upon a character such as ° please forward me to such or write up a quick example? :).

EDIT So I cycle through an entire line and when I see a special character, or rather some character that isn't an alpha character, digit character, '-' character, or ' ' character, I ask for the output of the character that didn't pass any of those tests.

I get an output like \303 which is an OCTAL format of the special character. heres the code I use to do the tests:

if (isalpha(aline[i+1]) || isdigit(aline[i+1]) || aline[i+1] == '-' || aline[i+1] == ' ')
   regionName.push_back(aline[i+1]);
else
   cout << aline[i+1] << endl;

So when the else statement is executed, I get octal outputs... by default... How would I change that to unicode format?

Example output:

\303
\203
\302

Solution

  • Welp, heres the answer I needed :) works great!!

    include the following libraries:

    #include <sstream>
    #include <iomanip>
    

    and pass any string you like to the function, it will encode all characters that are 'special'

    static string EncodeNonASCIICharacters (std::string value)
    {
        ostringstream stringBuilder;
    
        for (int i = 0; i < value.length(); i++)
        {
    
            unsigned int character = *reinterpret_cast<unsigned char *>(&(value[i]));
            if (character > 127)
            {
                stringBuilder << "\\u";
                stringBuilder << setw(4) << hex << setfill('0') << character;
            } else {
                string aValue;
                aValue += value[i];
                stringBuilder << aValue;
            }
        }
    
        return stringBuilder.str();
    }