Search code examples
c++inputasciidiacriticsstrchr

How can I read accented characters in C++ and use them with isalnum?


I am programming in French and, because of that, I need to use accented characters. I can output them by using #include <locale> and setlocale(LC_ALL, ""), but there seems to be a problem when I read accented characters. Here is simple example I made to show the problem :

#include <locale>
#include <iostream>

using namespace std;

const string SymbolsAllowed = "+-*/%";

int main()
{
    setlocale(LC_ALL, "");    // makes accents printable

    // Traduction : Please write a string with accented characters
    // 'é' is shown correctly :
    cout << "Veuillez écrire du texte accentué : ";

    string accentedString;
    getline(cin, accentedString);

    // Accented char are not shown correctly :
    cout << "Accented string written : " << accentedString << endl;

    for (unsigned int i = 0; i < accentedString.length(); ++i)
    {
        char currentChar = accentedString.at(i);

        // The program crashes while testing if currentChar is alphanumeric.
        // (error image below) :
        if (!isalnum(currentChar) && !strchr(SymbolsAllowed.c_str(), currentChar))
        {
            cout << endl << "Character not allowed : " << currentChar << endl;
            system("pause");
            return 1;
        }
    }

    cout << endl << "No unauthorized characters were written." << endl;

    system("pause");
    return 0;
}

Here is an output example before the program crashes :

Veuillez écrire du texte accentué : éèàìù
Accented string written : ʾS.?—

I noticed the debugger from Visual Studio shows that I have written something different than what it outputs :

[0] -126 '‚'    char
[1] -118 'Š'    char
[2] -123 '…'    char
[3] -115 ''     char
[4] -105 '—'    char

The error shown seems to tell that only characters between -1 and 255 can be used but, according to the ASCII table the value of the accented characters I used in the example above do not exceed this limit.

Here is a picture of the error dialog that pops up : Error message: Expression: c >= -1 && c <= 255

Can someone please tell me what I am doing wrong or give me a solution for this? Thank you in advance. :)


Solution

    1. char is a signed type on your system (indeed, on many systems) so its range of values is -128 to 127. Characters whose codes are between 128 and 255 look like negative numbers if they are stored in a char, and that is actually what your debugger is telling you:

      [0] -126 '‚'    char
      

      That's -126, not 126. In other words, 130 or 0x8C.

    2. isalnum and friends take an int as an argument, which (as the error message indicates) is constrained to the values EOF (-1 on your system) and the range 0-255. -126 is not in this range. Hence the error. You could cast to unsigned char, or (probably better, if it works on Windows), use the two-argument std::isalnum in <locale>

    3. For reasons which totally escape me, Windows seems to be providing console input in CP-437 but processing output in CP-1252. The high half of those two code pages is completely different. So when you type é, it gets sent to your program as 130 (0xC2) from CP-437, but when you send that same character back to the console, it gets printed according to CP-1252 as an (low) open single quote (which looks a lot like a comma, but isn't). So that's not going to work. You need to get input and output to be on the same code page.

    4. I don't know a lot about Windows, but you can probably find some useful information in the MS docs. That page includes links to Windows-specific functions which set the input and output code pages.

    5. Intriguingly, the accented characters in the source code of your program appear to be CP-1252, since they print correctly. If you decide to move away from code page 1252 -- for example, by adopting Unicode -- you'll have to fix your source code as well.