Search code examples
androidlinuxreverse-engineeringgot

Understanding PLT, GOT and hooking them (Linux and Android)


I am trying to better understand PLT and GOT.

  • Since every Linux executable has a PLT and a GOT, does it mean that in a process, there are more than one PLT and one GOT?
  • When people talk about hooking PLT or GOT, which ones are they referring to? I assume they are referring to the main executable's PLT and GOT?
  • How do PLT hooking actually work? PLT contains assembly instructions that jump to the GOT (which contains an entry for the address). So does PLT hooking means patching the actual assembly instructions to jump to a different address and GOT hooking simply replaces the address entry?

Solution

    1. Yes, there are more than one .plt and .got.plt sections presented in the process memory
    2. No, they are usually referring to the .got.plt sections of shared libraries. I will explain more on this point later.

    PLT (Procedure Linkage Table) is introduced for dynamic linking. A hooking method that exploits how linker relies on this mechanism can be called PLT hook. Especially, one can change the (functions') symbol resolving or binding results by modifying data in the .got.plt section.

    Here is a much simplied model:

    Suppose E is your executable and L is a (to be) linked share library for E. When L is loaded and the linker finishes the whole linking process for L, the .got.plt (could be other section with different names, but serve as the same function) is filled by values read from the .dynsym sections of L and its dependencies. Note that .dynsym consists of symbols to be exported from L and to be imported into L

    When all shared libraries of E are loaded, the linker starts to fill the .got.plt section of E, which enables E to find all of its import symbols (there are no exported ones).

    The above process of filling .got.plt is called relocation, which is determined by the .rel.* and .dynsym sections. The relocation section indicates the address in .got.plt to be filled with new values (i.e., addresses of functions in memory), and the new values are computed using offsets given in the .dynsym section.

    Now suppose E calls an imported function f from L. Then f appears both in the .got.plt sections of E and L, which are filled with the same value, i.e., the memory address of f. This reveals how linker reacts when an address is queried: it uses information of the ELF file to return a position, i.e., a .got.plt entry address in memory, and in this address, the address of the queried function is stored. That is to say, the returned value from linker, for example, the result of some dlsym call, is an integer that can be cast as a pointer to function.

    Recall that the assemble codes to call f will do the following:

    1. jump to the .plt section of E,
    2. then jump to the .got.plt part of E, read the value there, which is the address of f in memory
    3. read the address of f and execute it

    To do a PLT hook, one can choose to:

    1. modify the values stored in the .got.plt table of E, which will hook calls of f from E.
    2. modify the values stored in the .got.plt table of L, so that dlsym and new loaded libraries will execute the hooked function.
    3. change the .dynsym section of L, which can acheive similar effect of the second method, and thus will be considered the same method as 2.

    Both methods have their limits. The first one hooks functions imported from those already loaded libraries, but won't change the result of dlsym. The second one cannot hook functions that are directly called using their absolute address.

    To simplify more for memory, .got.plt will appear in every ELF for a process using dynamic linking. For a given function f in L, a pointer is firstly defined in L pointing to the memory address of f (the .got.plt entry for f in L is filled), then its dereference value is used to create a pointer for E (the .got.plt entry for f in E is filled). PLT hooks can change the dereference value of pointers in E or L. If the pointer changed is in E, then only calls of f from E will be hooked. If the pointer changed is in L, the dlsym will reference the hooked function and all later loaded libaries will be affected (since the pointers to f in them are created using the dereference value we have changed).

    In most usage case, the first method is applied to a shared library. That is to say, we hook functions that are imported into the target library. Both methods are called PLT hook, and change the so called GOT (Global Offset Table). Since in the second one the function address may be stored in other sections, I personally prefer the name PLT hook.