Search code examples
c++filestreamtxt

c++ read file with accents


Good day, I am in a small project where I need to read .txt files, the problem is that some are in English and others in Spanish, the case is being presented in which some information comes with an accent and I must show it on the console with the accent.

I have no problem displaying accents on console with setlocale(LC_CTYPE, "C");

my problem is when reading the .txt file in the reading it does not detect the accents and reads rare characters.

my practice code is:

#include <iostream>
#include <locale.h>
#include<fstream>
#include<string>

using namespace std;

int main(){
    
    setlocale (LC_CTYPE, "C");

    ifstream file;
    string text;
    
    file.open("entryDisciplineESP.txt",ios::in);
    
    if (file.fail()){
        
        cout<<"The file could not be opened."<<endl;
        
        exit(1); 
        
    }
    
    while(!file.eof()){ 

        getline(file,text);
        
        cout<<text<<endl;
        
    }
    
    cout<<endl;
    
    system("Pause");
    return 0;
}

The .txt file in question contains:

Inicio
D1
Biatlón
S1
255
E1
Esprint 7,5 km (M); 100; 200
E2
Persecucion 10 km (M); 100; 200
ff

obviously I'm having problems with 'ó' but in the same way I have other .txt with other characters with accents so I need a solution for all these characters.

Researching I have read and tried to implement wstring and wifstream but I have not been able to implement that successfully.

I'm trying to achieve this on windows, the same way I need the solution to work on linux, at the moment I'm using dev c++ 5.11

Thank you very much in advance for your time and help.


Solution

  • Your error is how you control your read-loop. See: Why !.eof() inside a loop condition is always wrong. Instead, control your read-loop with the stream-state returned by your read-function, e.g.

        while (getline(file,text)) {
            
            std::cout << text << '\n';
            
        }
    

    The character in question is simple extended ASCII (e.g. c3) and easily representable in std::string and with std::cout. Your full example, fixing Why is “using namespace std;” considered bad practice? would be

    #include <iostream>
    #include <fstream>
    #include <string>
    
    int main() {
        
        setlocale (LC_CTYPE, "C");
    
        std::ifstream file;
        std::string text;
        
        file.open ("entryDisciplineESP.txt");
        
        if (file.fail()){
            
            std::cerr << "The file could not be opened.\n";
            
            exit(1); 
        }
        
        while (getline(file,text)) {
            
            std::cout << text << '\n';
        }
        
        std::cout.put('\n');
        
    #ifdef _WIN32
        system("Pause");
    #endif
        return 0;
    }
    

    Example Output

    $ ./bin/accent_read
    Inicio
    D1
    Biatlón
    S1
    255
    E1
    Esprint 7,5 km (M); 100; 200
    E2
    Persecucion 10 km (M); 100; 200
    ff
    

    Windows 10 Using UTF-8 Codepage

    The problem you experience attempting to run the above code under Windows 10 console (which I presume is what DevC++ is launching output in), is the default codepage (437 - OEM United States) does not support UTF-8 characters. To change the codepage to UTF-8, you will use (65001 - Unicode (UTF-8)). See Code Page Identifiers

    To get the proper output after compiling under VS with the C++17 language standard, all that was needed was to change the codepage using chcp 65001 in the console. (you also must have an UTF-8 font, mine is set to Lucida Console)

    Output In Windows Console (Command Prompt) After Setting Codepage

    C:\Users\david\source\repos\accents>chcp 65001
    Active code page: 65001
    
    C:\Users\david\source\repos\accents>Debug\accents.exe
    Inicio
    D1
    Biatlón
    S1
    255
    E1
    Esprint 7,5 km (M); 100; 200
    E2
    Persecucion 10 km (M); 100; 200
    ff
    
    Press any key to continue . . .
    

    You have the additional need to set the codepage programmatically due to DevC++ automatically launching the console. You can do that using SetConsoleOutputCP (65001). For example:

    ...
    #include <windows.h>
    ...
    #define CP_UTF8 65001 
    
    int main () {
    
        // setlocale (LC_CTYPE, "C");           /* not needed */
        
        /* set console output codepage to UTF-8 */
        if (!SetConsoleOutputCP(CP_UTF8)) {
            std::cerr << "error: unable to set UTF-8 codepage.\n";
            return 1;
        }
        ...
    

    See SetConsoleOutputCP function. The analogous function for setting the input codepage is SetConsoleCP(uint codepage).

    Output Using SetConsoleOutputCP()

    Setting the console to the default 437 codepage and then using SetConsoleOutputCP (65001) to set output codepage to UTF-8, you get the same thing, e.g.

    C:\Users\david\source\repos\accents>chcp 437
    Active code page: 437
    
    C:\Users\david\source\repos\accents>Debug\accents.exe
    Inicio
    D1
    Biatlón
    S1
    255
    E1
    Esprint 7,5 km (M); 100; 200
    E2
    Persecucion 10 km (M); 100; 200
    ff
    
    Press any key to continue . . .
    

    Also, check the DevC++ project (or program) settings and check whether you can set the output codepage there. (I don't use it, so don't know if it is possible).