Search code examples

ctypes: Cast string to function?

I was reading the article Tips for Evading Anti-Virus During Pen Testing and was surprised by given Python program:

from ctypes import *
shellcode = '\xfc\xe8\x89\x00\x00....'

memorywithshell = create_string_buffer(shellcode, len(shellcode))
shell = cast(memorywithshell, CFUNCTYPE(c_void_p))

The shellcode is shortened. Can someone explain what is going on? I'm familiar with both Python and C, I've tried read on the ctypes module, but there are two main questions left:

  • What is stored in shellcode?
    I know this has something to do with C (in the article it is an shellcode from Metasploit and a different notation for ASCII was chosen), but I cannot identify whether if it's C source (probably not) or originates from some sort of compilation (which?).

  • Depending on the first question, what's the magic happening during the cast?


  • Have a look at this shellcode, I toke it from here (it pops up a MessageBoxA):

    #include <stdio.h>
    typedef void (* function_t)(void);
    unsigned char shellcode[] =
    void real_function(void) {
        puts("I'm here");
    int main(int argc, char **argv)
        function_t function = (function_t) &shellcode[0];
        return 0;

    Compile it an hook it under any debugger, I'll use gdb:

    > gcc shellcode.c -o shellcode
    > gdb -q shellcode.exe
    Reading symbols from shellcode.exe...done.

    Disassemble the main function to see that different between calling real_function and function:

    (gdb) disassemble main
    Dump of assembler code for function main:
       0x004013a0 <+0>:     push   %ebp
       0x004013a1 <+1>:     mov    %esp,%ebp
       0x004013a3 <+3>:     and    $0xfffffff0,%esp
       0x004013a6 <+6>:     sub    $0x10,%esp
       0x004013a9 <+9>:     call   0x4018e4 <__main>
       0x004013ae <+14>:    movl   $0x402000,0xc(%esp)
       0x004013b6 <+22>:    call   0x40138c <real_function> ; <- here we call our `real_function`
       0x004013bb <+27>:    mov    0xc(%esp),%eax
       0x004013bf <+31>:    call   *%eax                    ; <- here we call the address that is loaded in eax (the address of the beginning of our shellcode)
       0x004013c1 <+33>:    mov    $0x0,%eax
       0x004013c6 <+38>:    leave
       0x004013c7 <+39>:    ret
    End of assembler dump.

    There are two call, let's make a break point at <main+31> to see what is loaded in eax:

    (gdb) break *(main+31)
    Breakpoint 1 at 0x4013bf
    (gdb) run
    Starting program: shellcode.exe
    [New Thread 2856.0xb24]
    I'm here
    Breakpoint 1, 0x004013bf in main ()
    (gdb) disassemble
    Dump of assembler code for function main:
       0x004013a0 <+0>:     push   %ebp
       0x004013a1 <+1>:     mov    %esp,%ebp
       0x004013a3 <+3>:     and    $0xfffffff0,%esp
       0x004013a6 <+6>:     sub    $0x10,%esp
       0x004013a9 <+9>:     call   0x4018e4 <__main>
       0x004013ae <+14>:    movl   $0x402000,0xc(%esp)
       0x004013b6 <+22>:    call   0x40138c <real_function>
       0x004013bb <+27>:    mov    0xc(%esp),%eax
    => 0x004013bf <+31>:    call   *%eax                    ; now we are here
       0x004013c1 <+33>:    mov    $0x0,%eax
       0x004013c6 <+38>:    leave
       0x004013c7 <+39>:    ret
    End of assembler dump.

    Look at the first 3 bytes of the data that the address in eax continues:

    (gdb) x/3x $eax
    0x402000 <shellcode>:   0xfc    0x33    0xd2
    (gdb)                    ^-------^--------^---- the first 3 bytes of the shellcode

    So the CPU will call 0x402000, the beginning of our shell code at 0x402000, lets disassemble what ever at 0x402000:

    (gdb) disassemble 0x402000
    Dump of assembler code for function shellcode:
       0x00402000 <+0>:     cld
       0x00402001 <+1>:     xor    %edx,%edx
       0x00402003 <+3>:     mov    $0x30,%dl
       0x00402005 <+5>:     pushl  %fs:(%edx)
       0x00402008 <+8>:     pop    %edx
       0x00402009 <+9>:     mov    0xc(%edx),%edx
       0x0040200c <+12>:    mov    0x14(%edx),%edx
       0x0040200f <+15>:    mov    0x28(%edx),%esi
       0x00402012 <+18>:    xor    %ecx,%ecx
       0x00402014 <+20>:    mov    $0x18,%cl
       0x00402016 <+22>:    xor    %edi,%edi
       0x00402018 <+24>:    xor    %eax,%eax
       0x0040201a <+26>:    lods   %ds:(%esi),%al
       0x0040201b <+27>:    cmp    $0x61,%al
       0x0040201d <+29>:    jl     0x402021 <shellcode+33>

    As you see, a shellcode is nothing more than assembly instructions, the only different is in the way you write these instructions, it uses special techniques to make it more portable, for example never use a fixed address.

    The python equivalent to the above program:

    from ctypes import *
    shellcode_data = "\
    shellcode = c_char_p(shellcode_data)
    function = cast(shellcode, CFUNCTYPE(None))