c assembly lambda functional-programming shellcode

Is it practical to create a C language addon for anonymous functions?

I know that C compilers are capable of taking standalone code, and generate standalone shellcode out of it for the specific system they are targetting.

For example, given the following in anon.c:

int give3() {
    return 3;
}

I can run

gcc anon.c -o anon.obj -c
objdump -D anon.obj

which gives me (on MinGW):

anon1.obj:     file format pe-i386


Disassembly of section .text:

00000000 <_give3>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   b8 03 00 00 00          mov    $0x3,%eax
   8:   5d                      pop    %ebp
   9:   c3                      ret    
   a:   90                      nop
   b:   90                      nop

So I can make main like this:

main.c

#include <stdio.h>
#include <stdint.h>

int main(int argc, char **argv)
{
    uint8_t shellcode[] = {
        0x55,
        0x89, 0xe5,
        0xb8, 0x03, 0x00, 0x00, 0x00,
        0x5d, 0xc3,
        0x90,
        0x90
    };

    int (*p_give3)() = (int (*)())shellcode;
    printf("%d.\n", (*p_give3)());
}

My question is, is it practical to automate the process of converting the self contained anonymous function that does not refer to anything that is not within its scope or in arguments?

eg:

#include <stdio.h>
#include <stdint.h>

int main(int argc, char **argv)
{
    uint8_t shellcode[] = [@[
        int anonymous() {
            return 3;
        }
    ]];

    int (*p_give3)() = (int (*)())shellcode;
    printf("%d.\n", (*p_give3)());
}

Which would compile the text into shellcode, and place it into the buffer?

The reason I ask is because I really like writing C, but making pthreads, callbacks is incredibly painful; and as soon as you go one step above C to get the notion of "lambdas", you lose your language's ABI(eg, C++ has lambda, but everything you do in C++ is suddenly implementation dependent), and "Lisplike" scripting addons(eg plug in Lisp, Perl, JavaScript/V8, any other runtime that already knows how to generalize callbacks) make callbacks very easy, but also much more expensive than tossing shellcode around.

If this is practical, then it is possible to put functions which are only called once into the body of the function calling it, thus reducing global scope pollution. It also means that you do not need to generate the shellcode manually for each system you are targetting, since each system's C compiler already knows how to turn self contained C into assembly, so why should you do it for it, and ruin readability of your own code with a bunch of binary blobs.

So the question is: is this practical(for functions which are perfectly self contained, eg even if they want to call puts, puts has to be given as an argument or inside a hash table/struct in an argument)? Or is there some issue preventing this from being practical?

Solution

I know that C compilers are capable of taking standalone code, and generate standalone shellcode out of it for the specific system they are targeting.

Turning source into machine code is what compilation is. Shellcode is machine code with specific constraints, none of which apply to this use-case. You just want ordinary machine code like compilers generate when they compile functions normally.

AFAICT, what you want is exactly what you get from static foo(int x){ ...; }, and then passing foo as a function pointer. i.e. a block of machine code with a label attached, in the code section of your executable.

Jumping through hoops to get compiler-generated machine code into an array is not even close to worth the portability downsides (esp. in terms of making sure the array is in executable memory).

It seems the only thing you're trying to avoid is having a separately-defined function with its own name. That's an incredibly small benefit that doesn't come close to justifying doing anything like you're suggesting in the question. AFAIK, there's no good way to achieve it in ISO C11, but:

Some compilers support nested functions as a GNU extension:

This compiles (with gcc6.2). On Godbolt, I used -xc to compile it as C, not C++.. It also compiles with ICC17, but not clang3.9.

#include <stdlib.h>

void sort_integers(int *arr, size_t len)
{
  int bar(){return 3;}  // gcc warning: ISO C forbids nested functions [-Wpedantic]

  int cmp(const void *va, const void *vb) {
    const int *a=va, *b=vb;       // taking const int* args directly gives a warning, which we could silence with a cast
    return *a > *b;
  }

  qsort(arr, len, sizeof(int), cmp);
}

The asm output is:

cmp.2286:
    mov     eax, DWORD PTR [rsi]
    cmp     DWORD PTR [rdi], eax
    setg    al
    movzx   eax, al
    ret
sort_integers:
    mov     ecx, OFFSET FLAT:cmp.2286
    mov     edx, 4
    jmp     qsort

Notice that no definition for bar() was emitted, because it's unused.

Programs with nested functions built without optimization will have executable stacks. (For reasons explained below). So if you use this, make sure you use optimization if you care about security.

BTW, nested functions can even access variable in their parent (like lambas). Changing cmp into a function that does return len results in this highly surprising asm:

__attribute__((noinline)) 
void call_callback(int (*cb)()) {
  cb();
}

void foo(int *arr, size_t len) {
  int access_parent() { return len; }
  call_callback(access_parent);
}

## gcc5.4
access_parent.2450:
    mov     rax, QWORD PTR [r10]
    ret
call_callback:
    xor     eax, eax
    jmp     rdi
foo:
    sub     rsp, 40
    mov     eax, -17599
    mov     edx, -17847
    lea     rdi, [rsp+8]
    mov     WORD PTR [rsp+8], ax
    mov     eax, OFFSET FLAT:access_parent.2450
    mov     QWORD PTR [rsp], rsi
    mov     QWORD PTR [rdi+8], rsp
    mov     DWORD PTR [rdi+2], eax
    mov     WORD PTR [rdi+6], dx
    mov     DWORD PTR [rdi+16], -1864106167
    call    call_callback
    add     rsp, 40
    ret

I just figured out what this mess is about while single-stepping it: Those MOV-immediate instructions are writing machine-code for a trampoline function to the stack, and passing that as the actual callback.

gcc must ensure that the ELF metadata in the final binary tells the OS that the process needs an executable stack (note readelf -l shows GNU_STACK with RWE permissions). So nested functions that access outside their scope prevent the whole process from having the security benefits of NX stacks. (With optimization disabled, this still affects programs that use nested functions that don't access stuff from outer scopes, but with optimization enabled gcc realizes that it doesn't need the trampoline.)

The trampoline (from gcc5.2 -O0 on my desktop) is:

   0x00007fffffffd714:  41 bb 80 05 40 00       mov    r11d,0x400580   # address of access_parent.2450
   0x00007fffffffd71a:  49 ba 10 d7 ff ff ff 7f 00 00   movabs r10,0x7fffffffd710   # address of `len` in the parent stack frame
   0x00007fffffffd724:  49 ff e3        rex.WB jmp r11 
    # This can't be a normal rel32 jmp, and indirect is the only way to get an absolute near jump in x86-64.

   0x00007fffffffd727:  90      nop
   0x00007fffffffd728:  00 00   add    BYTE PTR [rax],al
   ...

(trampoline might not be the right terminology for this wrapper function; I'm not sure.)

This finally makes sense, because r10 is normally clobbered without saving by functions. There's no register that foo could set that would be guaranteed to still have that value when the callback is eventually called.

The x86-64 SysV ABI says that r10 is the "static chain pointer", but C/C++ don't use that. (Which is why r10 is treated like r11, as a pure scratch register).

Obviously a nested function that accesses variables in the outer scope can't be called after the outer function returns. e.g. if call_callback held onto the pointer for future use from other callers, you would get bogus results. When the nested function doesn't do that, gcc doesn't do the trampoline thing, so the function works just like a separately-defined function, so it would be a function pointer you could pass around arbitrarily.