Search code examples
c++endiannessbyte-order-mark

C program reads BOM in reverse (Go left... No! The other left)


I'm ... confused. Here's the thing. I've got an *ini file encoded as UNICODE (Little Endian). In my project in Visual Studio (my own ini parser) I'm checking if text file has got BOM (Byte Order Mark) at the begining of file.

From wikipedia:

11111111 11111110 (0xFFFE) - little endian BOM,

11111110 11111111 (0xFEFF) - big endian BOM.

So far, I'm right, right?

So it's time for little code:

size_t temp_val = 0;
wchar_t * endianness_val = new wchar_t;
temp_val = fread_s(endianness_val, sizeof(wchar_t), sizeof(wchar_t), 1, fp);

    if (*endianness_val == (wchar_t)0xFFFE)
    {
        endianness = 1;
        wprintf(L"\n UNICODE(16bit): Little Endian!");
    }
    else if (*endianness_val == (wchar_t)0xFEFF)
    {
        endianness = -1; //big endian
        wprintf(L"\n UNICODE(16bit): Big Endian!");
    }
    else
    {
        endianness = 0; //no BOM, little endian default
        wprintf(L"\n No BOM. Narrow characters (8bit) Assuming Little Endian!");
    }

I'm reading (using fread_s) first wchar_t from file and I'm storing it in endianness_val. Everything seems to be good:

  • *ini file HAS Byte Order Mark(0xFFFE),
  • looking into memory (debuging) gives me the same result - endianess varable stores 0xFFFE.

Aaaannd Visual Studio keeps going into if statement for Big Endian (like a maniac ;)). Of course changing BOM for Big Endian results in Visual Studio entering correct if statement. Any ideas why this works backwards?

Thanks.


Solution

  • Try running the following code on your text file open in fp and see if it helps you catch your conceptual error:

    uint8_t bytes[2];
    uint16_t word;
    
    fread(bytes, 1, 2, fp);
    fseek(fp, 0, SEEK_SET);
    fread(&word, 2, 1, fp);
    fclose(fp);
    
    wprintf(L"%.2hhX %.2hhX\n", bytes[0], bytes[1]);
    wprintf(L"%.4hX\n", word);