Search code examples
c++stringifstream

Leading bad characters


I am receiving 3 bad characters at the beginning of my first word out of a file, using ifstream. I have tried trimming the string, but to no avail.

string trim(string in){
    string out;
    int strBegin = in.find_first_not_of(" \t");
    if( strBegin == string::npos)
        return "";//nothing

    int strEnd = in.find_last_not_of(" \t");
    int strRange = strEnd - strBegin + 1;

    out = in.substr(strBegin, strRange);
    transform(out.begin(), out.end(), out.begin(), ::tolower);

    return out;
}

And here is the output i am getting when reading "Hello," the first word in my file.

� i: 0
�� i: 1
�he i: 2
hell i: 3
ello i: 4
llo i: 5
lo i: 6
o i: 7

And this is printed from:

for(int i=0; i<temp.length(); i++){
    cout<<temp.substr(i,i+1)<<" i: "<<i<<endl;
}


cat -vg on txt file

M-oM-;M-?Hello, world test

file actually contains "Hello, World test" as first line


Solution

  • You have a UTF-8 byte order mark at the beginning of your document consisting of the 3-byte sequence EF BB BF in hex (which is equivalent to M-oM-;M-?, in the language of cat -vt).

    You need to either modify your program to deal with the BOM appropriately (i.e. ignore it) or re-encode your text file to remove the BOM. Many text editing programs will not display the BOM in your document, so you may think it's not there, but it's there nonetheless.