Search code examples
cmach-ofatuniversal-binaryfat-binaries

Parsing universal/fat binary files


I'm working on a project to implement a basic nm using memory-mapping mmap. I have been able to parse 64-bit binaries using the code:

void        handle_64(char *ptr)
{
    int                     ncmds;
    struct mach_header_64   *header;
    struct load_command     *lc;
    struct symtab_command   *sym;
    int                     i;

    i = 0;
    header = (struct mach_header_64 *)ptr;
    ncmds = header->ncmds;
    lc = (void *)ptr + sizeof(*header);
    while (i < ncmds)
    {
        if (lc->cmd == LC_SYMTAB)
        {
            sym = (struct symtab_command *)lc;
            build_list (sym->nsyms, sym->symoff, sym->stroff, ptr);
            break;
         }
         lc = (void *) lc + lc->cmdsize;
         i++;
    }
}

According to this link the only difference between a mach-o and a fat binary is the fat_header struct above it, but simply skipping over with

lc = (void *)ptr + sizeof(struct fat_header) + sizeof(struct mach_header_64);

doesn't get me to the load_command area (segfault). How do I access the load commands of a fat/universal binary.

I'm working on a 64-bit Mac running macOS High Sierra. Thank you.


Solution

  • You've got multiple problems:

    • What that blog you link to calls "fat header" is more than just struct fat_header.
    • Nowhere is it guaranteed that a Mach-O header will follow immediately after the fat header, just somewhere after it (usually Mach-O segments want to be aligned to whole pages, so putting them immediately after the fat header might not even work).
    • Nowhere is it guaranteed that the 64-bit slice will be the first in the binary, nor that there even will be one.

    Considering all of that, you need to parse the fat header (and not just ignore it) if you want any hope of getting useful results.

    Now, fat_header is defined as follows:

    struct fat_header {
        uint32_t    magic;      /* FAT_MAGIC or FAT_MAGIC_64 */
        uint32_t    nfat_arch;  /* number of structs that follow */
    };
    

    Firstly, the magic value that I usually see for fat binaries is FAT_CIGAM rather than FAT_MAGIC, despite the comment stating otherwise (take care though - this means that integers in the fat header are big endian rather than little endian!). But secondly, it is indicated that certain structs follow this header, namely:

    struct fat_arch {
        cpu_type_t  cputype;    /* cpu specifier (int) */
        cpu_subtype_t   cpusubtype; /* machine specifier (int) */
        uint32_t    offset;     /* file offset to this object file */
        uint32_t    size;       /* size of this object file */
        uint32_t    align;      /* alignment as a power of 2 */
    };
    

    This works the same way a "thin" Mach-O header does with its load commands. fat_arch.offset is the offset from the very beginning of the file. Following that, it's quite simple to print all slices of a fat Mach-O:

    #include <stdio.h>
    #include <mach-o/fat.h>
    
    #define SWAP32(x) ((((x) & 0xff000000) >> 24) | (((x) & 0xff0000) >> 8) | (((x) & 0xff00) << 8) | (((x) & 0xff) << 24))
    
    void print_fat_header(void *buf)
    {
        struct fat_header *hdr = buf;
        if(hdr->magic != FAT_CIGAM)
        {
            fprintf(stderr, "bad magic: %08x\n", hdr->magic);
            return;
        }
        struct fat_arch *archs = (struct fat_arch*)(hdr + 1);
        uint32_t num = SWAP32(hdr->nfat_arch);
        for(size_t i = 0; i < num; ++i)
        {
            const char *name = "unknown";
            switch(SWAP32(archs[i].cputype))
            {
                case CPU_TYPE_I386:     name = "i386";      break;
                case CPU_TYPE_X86_64:   name = "x86_64";    break;
                case CPU_TYPE_ARM:      name = "arm";       break;
                case CPU_TYPE_ARM64:    name = "arm64";     break;
            }
            uint32_t off = SWAP32(archs[i].offset);
            uint32_t magic = *(uint32_t*)((uintptr_t)buf + off);
            printf("%08x-%08x: %-8s (magic %8x)\n", off, off + SWAP32(archs[i].size), name, magic);
        }
    }
    

    Note that the above function is incomplete, as it does not know the length of buf and thus cannot and does not check any accessed memory against it. In a serious implementation, you should make sure to never read outside the buffer you're given. The fact that your code segfaulted also hints at it not doing enough data sanitisation.