Search code examples
gdbctypesgdb-python

memset() does not work when called in GDB thru Python APIs


I'm trying to memset the data at some address in a gdb session.

Lets say it is initially filled with 1's and I'm trying to overwrite it with 0's.

(gdb) set $i = (int*)malloc(sizeof(int))

(gdb) set *$i = -1

(gdb) x/t $i

0x76d8550:  11111111111111111111111111111111

The data is not modified at all when I:

  • Make a gdb.Value out of memset function pointer and call it in Python with the right address;

  • Run ctypes' memset(), passing it the right address.

(gdb) pi memset = gdb.parse_and_eval("(void*(*)(void*,int,size_t))memset").dereference()

(gdb) pi str(memset)

'{void *(void *, int, size_t)} 0x7fffe992e760 <memset>'

(gdb) pi

>>> i = 0x76d8550

>>> memset(i,0,4)

<gdb.Value object at 0x7fdc5c0fbef0>

>>> gdb.execute("x/t $i")

0x76d8550:  11111111111111111111111111111111
>>> import ctypes

>>> ctypes.memset(i,0,4)

124618064

>>> gdb.execute("x/t $i")

0x76d8550:  11111111111111111111111111111111

The data is modified as expected when I:

  • Evaluate a string with the complete memset() expression using gdb.parse_and_eval().
>>> gdb.parse_and_eval("(void*)memset({},0,4)".format(i))

<gdb.Value object at 0x7fdb9eef58b0>

>>> gdb.execute("x/t $i")

0x76d8550:  00000000000000000000000000000000

Any explanation on why the 1st two options aren't working?

Thanks


Solution

  • Judging by the addresses you get printed you are probably running on Linux/x86_64, and possibly using GLIBC as your standard C library. If so ...

    ... memset is complicated.

    First, there are two separate implementations of memset -- a minimal one in ld-linux.so and a full-function one inside libc.so.6.

    Second, the full implementation in libc.so.6 is a GNU IFUNC, which means that it doesn't itself write to memory, it just returns the address of the function that should be used to write to memory on a given processor.

    Lastly, as sbssa commented, ctype.memset() can't possibly work, because that's a memset that is within the GDB itself, not the memset in the inferior (being debugged) process. By calling ctypes.memset(i,0,4) you are corrupting a random location within GDB. This could result in anything, starting from "no effect" (if the corrupted address was unused but valid) to having the expression immediately crash (if the "to be corrupted" address is invalid) to a random crash in GDB later (if that corrupted address is actually used by GDB for something).


    Putting this all together:

    #include <string.h>
    
    int jj = -1;
    int main()
    {
      memset(&jj, 0, sizeof(jj)); return 0;
      return 0;
    }
    

    Compiled with gcc -g x.c, and running under GDB on Fedora 38 x86_64:

    gdb -q ./a.out
    Reading symbols from ./a.out...
    (gdb) start
    Temporary breakpoint 1 at 0x40112a: file x.c, line 6.
    Starting program: /tmp/a.out
    
    Temporary breakpoint 1, main () at x.c:6
    6         memset(&jj, 0, sizeof(jj)); return 0;
    Missing separate debuginfos, use: dnf debuginfo-install glibc-2.37-10.fc38.x86_64
    (gdb) p &memset
    $1 = (void (*)(void)) 0x7ffff7fec530 <memset>
    
    (gdb) pi memset = gdb.parse_and_eval("(void*(*)(void*,int,size_t))memset").dereference()
    (gdb) pi print(str(memset))
    {void *(void *, int, size_t)} 0x7ffff7fec530 <memset>
    
    (gdb) info sym 0x7ffff7fec530
    memset in section .text of /lib64/ld-linux-x86-64.so.2
    

    Here you can see that GDB got the wrong memset (the minimal implementation). It would still work, but is suboptimal and may have other restrictions -- it was never intended to be used outside of ld-linux itself. For example, it may assume that the buffer is 8-byte aligned, or that the size is at least 8 bytes, etc.

    What about the real memset that is called in main?

    (gdb) b memset
    Breakpoint 2 at 0x7ffff7e7cf60 (2 locations)
    (gdb) c
    Continuing.
    
    Breakpoint 2.1, 0x00007ffff7e7cf60 in memset_ifunc () from /lib64/libc.so.6
    (gdb) bt
    #0  0x00007ffff7e7cf60 in memset_ifunc () from /lib64/libc.so.6
    #1  0x00007ffff7fdac42 in elf_ifunc_invoke (addr=<optimized out>) at ../sysdeps/x86_64/dl-irel.h:32
    #2  _dl_fixup (l=0x7ffff7ffe2d0, reloc_arg=<optimized out>) at dl-runtime.c:125
    #3  0x00007ffff7fdcf3e in _dl_runtime_resolve_xsavec () at ../sysdeps/x86_64/dl-trampoline.h:130
    #4  0x000000000040113e in main () at x.c:6
    

    Note that this is the IFUNC` I was talking about.

    (gdb) fin
    Run till exit from #0  0x00007ffff7e7cf60 in memset_ifunc () from /lib64/libc.so.6
    0x00007ffff7fdac42 in _dl_fixup (l=0x7ffff7ffe2d0, reloc_arg=<optimized out>) at dl-runtime.c:125
    125     dl-runtime.c: No such file or directory.
    (gdb) p/x $rax
    $2 = 0x7ffff7f37950
    (gdb) info sym 0x7ffff7f37950
    __memset_avx2_unaligned in section .text of /lib64/libc.so.6
    

    The __memset_avx2_unaligned is the actual memset implementation selected for this host (out of several possible; the other possibilities in this build of GLIBC are: __memset_erms, __memset_avx2_unaligned_erms, __memset_evex_unaligned, __memset_evex_unaligned_erms).

    Note that even though we've already returned from "memset", the value of jj is still -1:

    (gdb) x/t &jj
    0x40400c <jj>:  11111111111111111111111111111111
    (gdb) watch -l jj
    Hardware watchpoint 3: -location jj
    (gdb) c
    Continuing.
    
    Hardware watchpoint 3: -location jj
    
    Old value = -1
    New value = 0
    0x00007ffff7f37ae4 in __memset_avx2_unaligned_erms () from /lib64/libc.so.6
    

    P.S. Why is the value changed by __memset_avx2_unaligned_erms() and not by __memset_avx2_unaligned() ?

    Because the latter uses "tail call" to the latter:

    (gdb) disas __memset_avx2_unaligned
    Dump of assembler code for function __memset_avx2_unaligned:
       0x00007ffff7f37950 <+0>:     endbr64
       0x00007ffff7f37954 <+4>:     vmovd  %esi,%xmm0
       0x00007ffff7f37958 <+8>:     mov    %rdi,%rax
       0x00007ffff7f3795b <+11>:    cmp    $0x20,%rdx
       0x00007ffff7f3795f <+15>:    jb     0x7ffff7f37aa0 <__memset_avx2_unaligned_erms+224>
       0x00007ffff7f37965 <+21>:    vpbroadcastb %xmm0,%ymm0
       0x00007ffff7f3796a <+26>:    cmp    $0x40,%rdx
       0x00007ffff7f3796e <+30>:    ja     0x7ffff7f37a09 <__memset_avx2_unaligned_erms+73>
       0x00007ffff7f37974 <+36>:    vmovdqu %ymm0,-0x20(%rdi,%rdx,1)
       0x00007ffff7f3797a <+42>:    vmovdqu %ymm0,(%rdi)
       0x00007ffff7f3797e <+46>:    vzeroupper
       0x00007ffff7f37981 <+49>:    ret