Search code examples
c++visual-studioencodingifstream

Characters not recognized while reading from file


I have the following c++ code in visual studio to read characters from a file.

    ifstream infile;
    infile.open(argv[1]);

    if (infile.fail()) {
        cout << "Error reading from file: " << strerror(errno) << endl;
        cout << argv[0] << endl;
    }
    else {
        char currentChar;

        while (infile.get(currentChar)) {
            cout << currentChar << " " << int(currentChar) << endl;
            //... do something with currentChar
        }

        ofstream outfile("output.txt");
        outfile << /* output some text based on currentChar */;
    }
    infile.close();

The file in this case is expected to contain mostly normal ASCII characters, with the exception of two: and .

The problem is that the code in it's current form is not able to recognise those characters. couting the character outputs garbage, and its int conversion yields a negative number that's different depending on where in the file it occurs.

I have a hunch that the problem is encoding, so I've tried to imbue infile based on some examples on the internet, but I haven't seemed to get it right. infile.get either fails when reaching the quote character, or the problem remains. What details am I missing?


Solution

  • The file you are trying to read is likely UTF-8 encoded. The reason most characters read fine is because UTF-8 is backwards compatible with ASCII.

    In order to read a UTF-8 file I'll refer you to this: http://en.cppreference.com/w/cpp/locale/codecvt_utf8

    #include <fstream>
    #include <iostream>
    #include <string>
    #include <locale>
    #include <codecvt>
    ...
    
    // Write file in UTF-8
    std::wofstream wof;
    wof.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t,0x10ffff,std::generate_header>));
    wof.open(L"file.txt");
    wof << L"This is a test.";
    wof << L"This is another test.";
    wof << L"\nThis is the final test.\n";
    wof.close();
    
    // Read file in UTF-8
    std::wifstream wif(L"file.txt");
    wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t,0x10ffff, std::consume_header>));
    
    std::wstringstream wss;
    wss << wif.rdbuf();
    

    (from here)