Essentially, I am doing something similar to https://wiki.osdev.org/ELF_Tutorial, where I load the data into structs and read the various sections by their offsets. The host is little endian and I'm trying to analyze files that were cross-compiled for a big endian target. I tried doing the same code sequence with these big endian files as with the little endian files, but the code segfaults when trying to access the sections.
int fd = open(filename, O_RDONLY);
char *header_start = (char *)mmap(0, file_size, PROT_READ, MAP_PRIVATE, fd, 0);
Elf32_Ehdr* elf_ehdr = (Elf32_Ehdr *)header_start;
Elf32_Shdr* elf_shdrs = (Elf32_Shdr *)((int)header_start + elf_ehdr->e_shoff);
Elf32_Shdr* sh_strtab = &elf_shdrs[elf_ehdr->e_shstrndx];
// code segfaults here when trying to access sh_strtab->sh_offset for big endian
// files, but works just fine for little endian files
Why does the code fail for big endian files?
In a big endian file elf_ehdr->e_shoff
is going to be a big endian integer, and the big endian byte order needs to be respected.
Say we're dealing in 32 bits and e_shoff
is a nice small number like 64. In big endian it going to be recorded in the file as 0x00000040. But you're reading this file on what appears to be a little endian CPU, so that 0x00000040 is read out of the file as a binary blob and that will be interpreted by the CPU as 1073741824.
Elf32_Shdr* elf_shdrs = (Elf32_Shdr *)((int)header_start + elf_ehdr->e_shoff);
resolves to
Elf32_Shdr* elf_shdrs = (Elf32_Shdr *)((int)header_start + 1073741824);
not
Elf32_Shdr* elf_shdrs = (Elf32_Shdr *)((int)header_start + 64);
and is going to miss the target by a wide margin. Trying to access members of the resulting elf_shdrs
wanders into undefined behaviour.
Quick hack fix is
Elf32_Shdr* elf_shdrs = (Elf32_Shdr *)(header_start + ResolveEndian(elf_ehdr->e_shoff));
where ResolveEndian
is a series of overloaded functions that either do absolutely nothing because the file endian matches the system endian or flips the byte order. For many examples of how to do this, see How do I convert between big-endian and little-endian values in C++?
The longer fix would not use memory mapped files and would instead deserialize the file taking into account the differences in variable sizes (and the resulting differences in offsets) between 32 and 64 bit programs as well as endian. This will result in a more robust and portable parser that will always work regardless of the source ELF and the compiler implementation used to build the parser.