I am learning shellcodes.
I have found this shellcode in a tutorial:
python -c 'print "\x90\x90\x90\x90\x90\x90\x90\x90\x90\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80 "' > shellcode
What i want to do is to disassemble this very basic shellcode in order to understand how it works.
Here is what i done:
$ objdump -D -b binary -m i8086 shellcode
shellcode: file format binary
Disassembly of section .data:
00000000 <.data>:
0: 90 nop
1: 90 nop
2: 90 nop
3: 90 nop
4: 90 nop
5: 90 nop
6: 90 nop
7: 90 nop
8: 90 nop
9: 31 c0 xor %ax,%ax
b: 50 push %ax
c: 68 2f 2f push $0x2f2f
f: 73 68 jae 0x79
11: 68 2f 62 push $0x622f
14: 69 6e 89 e3 50 imul $0x50e3,-0x77(%bp),%bp
19: 53 push %bx
1a: 89 e1 mov %sp,%cx
1c: b0 0b mov $0xb,%al
1e: cd 80 int $0x80
Or:
$ ndisasm shellcode
00000000 90 nop
00000001 90 nop
00000002 90 nop
00000003 90 nop
00000004 90 nop
00000005 90 nop
00000006 90 nop
00000007 90 nop
00000008 90 nop
00000009 31C0 xor ax,ax
0000000B 50 push ax
0000000C 682F2F push word 0x2f2f
0000000F 7368 jnc 0x79
00000011 682F62 push word 0x622f
00000014 696E89E350 imul bp,[bp-0x77],word 0x50e3
00000019 53 push bx
0000001A 89E1 mov cx,sp
0000001C B00B mov al,0xb
0000001E CD80 int 0x80
This shellcode contains strings which are interpreted as x86 instructions. Is there a way to put proper labels on jumps ?
And is there a way to display strings instead of decoding x86 instructions on strings. I know this is not easy because there is no elf with sections and headers...
If you had shellcode which used call
or jmp
to jump over some data, you'd have to replace the strings with NOPs if the disassembler got out of sync while treating the data as instructions, as @DavidJ suggested.
In this case, you're just disassembling in the wrong mode.
The jnc
is clearly bogus (as I think you realized).
The disassembler is treating the push
opcode (the 0x68
byte) as the start of push imm16
, because that's how 16-bit mode works. But in 32 and 64-bit modes, the same opcode is the start of a push imm32
. So push
instruction is actually 5 bytes instead of 3, and the next instruction is actually the next push
.
The bogus short-jnc
is a huge hint that this is not 16-bit code.
Use ndisasm -b32
or -b64
. Ndisasm can read input from stdin, so I used python2 -c 'print "... "' | ndisasm - -b32
.
When using objdump
, if you prefer Intel syntax, use objdump -d -Mintel
. So you could objdump -Mintel -bbinary -D -mi386 /tmp/shellcode
for 32-bit (-mi386
selects x86 as the architecture (rather than ARM or MIPS or whatever), and implies -Mi386
32-bit mode as well).
Or for 64-bit, objdump -D -b binary -mi386 -Mx86-64 /tmp/shellcode
works. (objdump
won't read the binary from stdin :/) Check the objdump
man page for more about -M
options.
I use this alias in my ~/.bashrc
: alias disas='objdump -drwC -Mintel'
, because I normally disassemble ELF executables / objects to see what a compiler did, not shellcode. You might want -D
in your alias.
I'm pretty sure this is 32-bit code, because in 64-bit mode the two pushes would leave a gap. The is no push imm64
, but push imm32
is a 64-bit push with the immediate sign-extended to 64 bits. In 64-bit mode, you might use
push 'abcd'
mov [rsp+4], 'efgh'
to end up with rsp pointing to "abcdefgh"
.
Also, the use of int 0x80
with a stack address is a big clue this is not 64-bit code. int 0x80
works on Linux in 64-bit mode, but it truncates all inputs to 32-bit: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?
The 32-bit disassembly from ndisasm is:
00000000 90 nop
00000001 90 nop
00000002 90 nop
00000003 90 nop
00000004 90 nop
00000005 90 nop
00000006 90 nop
00000007 90 nop
00000008 90 nop
00000009 31C0 xor eax,eax
0000000B 50 push eax
0000000C 682F2F7368 push dword 0x68732f2f
00000011 682F62696E push dword 0x6e69622f
00000016 89E3 mov ebx,esp
00000018 50 push eax
00000019 53 push ebx
0000001A 89E1 mov ecx,esp
0000001C B00B mov al,0xb
0000001E CD80 int 0x80
00000020 200A and [edx],cl
Which looks sane. It contains no branches, but
Is there a way to put proper labels on jumps?
Yes, Agner Fog's objconv
disassembler can put labels on branch targets to help you figure out which branch goes where.
See How do I disassemble raw x86 code?