Search code examples
clinuxlinkerelfembedded-resource

How to read data embedded into the program's own ELF?


The objcopy tool makes it easy to embed arbitrary files into an ELF executable:

objcopy --add-section program.file1=file1.dat \
        --add-section program.file2=file2.dat \
        program program+files

It seems to me that it should be possible for the program+files to access file1 and file2 programmatically without opening and reading any external files. However, there seems to be no easy way to obtain this information from within the running program.

The files were added as named sections of the ELF executable. However, Linux only loads the segments described by the ELF program header table. The sections are never present in that set since they are not necessary for execution.

So while it is possible to obtain a pointer to the currently running program's ELF header, it is pointless since the sections were not loaded at all.

uintptr_t address = getauxval(AT_PHDR) & -4096;
Elf64_Ehdr *elf = (Elf64_Ehdr *) address;

// dangling pointer, sections aren't loaded by the OS
Elf64_Shdr *sections = ((unsigned char *) elf) + elf->e_shoff;

My intention was to search the sections by name at runtime, find the ones prefixed by program. and compute pointers to them so that my code can use them like ordinary memory blocks.

I can't use predefined symbols for this because I want to support an arbitrary number of embedded files, including no embedded file at all. I need to look up these sections at runtime.

Linux will only load segments marked with PT_LOAD. Can these sections be placed in PT_LOAD segments somehow? objcopy does not seem to have the ability to edit the program header table and add new PT_LOAD segments. How would one go about doing that?


Solution

  • My intention was to search the sections by name at runtime, find the ones prefixed by program. and compute pointers to them so that my code can use them like ordinary memory blocks.

    You can find the program on disk (using /proc/self/exe), mmap it1, decode section headers (see this answer) and then compute pointers to sections of interest and use them as you wish.

    Can these sections be placed in PT_LOAD segments somehow?

    No: that would require rebuilding parts of the executable which are not possible to rebuild without re-linking the entire program.

    Update:

    If you don't care all that much about memory usage of your program, you could modify the last LOAD segment to "cover" the entire program+files, and then you can skip the separate mmap -- the files would already be in memory.

    You just need to increase the .p_filesz and .p_memsz such that phdr.p_offset + phdr.p_filesz == file_size.

    The price is that you'll cause data that normally isn't loaded into memory (e.g. section header, debug sections (if any)) to occupy memory. But with demand paging, the price could be very small -- nothing should access these "extra" memory regions, and so nothing should cause them to be paged in.

    P.S. I know of no standard utility that can update .p_filesz etc, but it's pretty easy to write such patcher in C or in Python.


    1 You don't have to mmap the entire program, just the part of it which contains desired section(s).