Search code examples
c++syntaxbmpbitmapimage

What is *(int*)&data[18] actually doing in this code?


I came across this syntax for reading a BMP file in C++

#include <fstream>
int main() {
    std::ifstream in('filename.bmp', std::ifstream::binary);
    in.seekg(0, in.end);
    size = in.tellg();
    in.seekg(0);
    unsigned char * data = new unsigned char[size];
    in.read((unsigned char *)data, size);

    int width = *(int*)&data[18];
    // omitted remainder for minimal example
}

and I don't understand what the line

int width = *(int*)&data[18];

is actually doing. Why doesn't a simple cast from unsigned char * to int, int width = (int)data[18];, work?


Solution

  • Note

    As @user4581301 indicated in the comments, this depends on the implementation and will fail in many instances. And as @NathanOliver- Reinstate Monica and @ChrisMM pointed out this is Undefined Behavior and the result is not guaranteed.

    According to the bitmap header format, the width of the bitmap in pixels is stored as a signed 32-bit integer beginning at byte offset 18. The syntax

    int width = *(int*)&data[18];
    

    reads bytes 19 through 22, inclusive (assuming a 32-bit int) and interprets the result as an integer.

    How?

    • &data[18] gets the address of the unsigned char at index 18
    • (int*) casts the address from unsigned char* to int* to avoid loss of precision on 64 bit architectures
    • *(int*) dereferences the address to get the referred int value

    So basically, it takes the address of data[18] and reads the bytes at that address as if they were an integer.

    Why doesn't a simple cast to `int` work?

    sizeof(data[18]) is 1, because unsigned char is one byte (0-255) but sizeof(&data[18]) is 4 if the system is 32-bit and 8 if it is 64-bit, this can be larger (or even smaller for 16-bit systems) but with the exception of 16-bit systems it should be at minimum 4 bytes. Obviously reading more than 4 bytes is not desired in this case, and the cast to (int*) and subsequent dereference to int yields 4 bytes, and indeed the 4 bytes between offsets 18 and 21, inclusive. A simple cast from unsigned char to int will also yield 4 bytes, but only one byte of the information from data. This is illustrated by the following example:

    #include <iostream>
    #include <bitset>
    
    int main() {
        // Populate 18-21 with a recognizable pattern for demonstration
        std::bitset<8> _bits(std::string("10011010"));
        unsigned long bits = _bits.to_ulong();
        for (int ii = 18; ii < 22; ii ++) {
            data[ii] = static_cast<unsigned char>(bits);
        }
    
        std::cout << "data[18]                    -> 1 byte  " 
            << std::bitset<32>(data[18]) << std::endl;
        std::cout << "*(unsigned short*)&data[18] -> 2 bytes " 
            << std::bitset<32>(*(unsigned short*)&data[18]) << std::endl;
        std::cout << "*(int*)&data[18]            -> 4 bytes " 
            << std::bitset<32>(*(int*)&data[18]) << std::endl;
    }
    
    data[18]                    -> 1 byte  00000000000000000000000010011010
    *(unsigned short*)&data[18] -> 2 bytes 00000000000000001001101010011010
    *(int*)&data[18]            -> 4 bytes 10011010100110101001101010011010