Search code examples
c++binaryendiannesshexdumpxxd

Why does od and my C++ code read in a different endianness than what is rendered by hex editors?


I noticed an odd behavior where od -H and Vim's hex editor (open a file and use the command :%!xxd) display different endianness for the same data. I wrote some C++ code that dumps the first uint32_t from a file, and its endianness matches that of od instead of what is displayed in the hex editor:

dump.cc:

#include <cstdio>
#include <iostream>
#include <stdexcept>
#include <vector>

std::vector<uint8_t> ReadFile(const std::string &filename) {
  FILE *file = fopen(filename.c_str(), "rb");
  if (file == NULL) {
    throw std::runtime_error("Error opening file: " + filename);
  }

  fseek(file, 0L, SEEK_END);
  size_t file_size = ftell(file);
  rewind(file);

  std::vector<uint8_t> buffer(file_size);
  size_t bytes_read = fread(buffer.data(), 1, file_size, file);
  if (bytes_read != file_size) {
    fclose(file);
    throw std::runtime_error("Error reading file: " + filename);
  }
  fclose(file);
  return buffer;
}

int main(int argc, char **argv) {
  if (argc != 2) {
    std::cerr << "usage: dump FILE" << std::endl;
    return EXIT_FAILURE;
  }
  const char *filename = argv[1];
  const std::vector<uint8_t> buf = ReadFile(filename);

  uint32_t first_int;
  memcpy(&first_int, buf.data(), sizeof(uint32_t));
  std::cout << std::hex << first_int << std::endl;

  return EXIT_SUCCESS;
}

Compile and run:

$ g++ ./dump.cc -o dump
$ ./dump ./dump.cc
636e6923

In comparison, here are the first two lines of od -H:

$ od -H ./dump.cc | head -n 2
0000000          636e6923        6564756c        73633c20        6f696474
0000020          69230a3e        756c636e        3c206564        74736f69

On the other hand, here is what Vim displays:

00000000: 2369 6e63 6c75 6465 203c 6373 7464 696f  #include <cstdio
00000010: 3e0a 2369 6e63 6c75 6465 203c 696f 7374  >.#include <iost

I also opened the file in a hex editor app and it is rendering in the same endianness that Vim displays:

 0    23 69 6e 63 6c 75 64 65 20 3c 63 73 74 64 69 6f 3e 0a 23 69
20    6e 63 6c 75 64 65 20 3c 69 6f 73 74 72 65 61 6d 3e 0a 23 69

Why is that od and my code displaying a different endianness? How do I get my code to read in the same endianness that these hex editors are displaying?

I am on macOS 14 on Apple Silicon; however, I am observing the same behavior on Ubuntu running on Windows 11 WSL on x86.

Thank you in advance.


Solution

  • vim and your hex editor are working at byte level, showing them in the order they are in the file.

    od interprets the sequence of bytes. Option -H read four bytes and interpret them as a 32-bits (four bytes) int. You must know that there exists different mapping of the bytes of an int in memory (it is just like writing something on a paper, L-to-R or R-to-L), basically two:

    • BIG ENDIAN : bytes are stored L-to-R in memory from the most significant byte to the less.
    • LITTLE ENDIAN : bytes are stores L-to-R in memory from the less significant byte to the most.

    The file starts with 23 69 6e 63 but as your platforms are little endian (x86 is little endian as silicon) the int is 0x63*256^3 + 0x6e*256^2 + 0x69*256^1 + 0x23*256^0.

    You may use od to read byte by byte with od -tx1.