diassemble strings properly in shellcode

I am learning shellcodes.

I have found this shellcode in a tutorial:

python -c 'print "\x90\x90\x90\x90\x90\x90\x90\x90\x90\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80 "' > shellcode

What i want to do is to disassemble this very basic shellcode in order to understand how it works.

Here is what i done:

$ objdump -D -b binary -m i8086 shellcode 

shellcode:     file format binary


Disassembly of section .data:

00000000 <.data>:
   0:   90                      nop
   1:   90                      nop
   2:   90                      nop
   3:   90                      nop
   4:   90                      nop
   5:   90                      nop
   6:   90                      nop
   7:   90                      nop
   8:   90                      nop
   9:   31 c0                   xor    %ax,%ax
   b:   50                      push   %ax
   c:   68 2f 2f                push   $0x2f2f
   f:   73 68                   jae    0x79
  11:   68 2f 62                push   $0x622f
  14:   69 6e 89 e3 50          imul   $0x50e3,-0x77(%bp),%bp
  19:   53                      push   %bx
  1a:   89 e1                   mov    %sp,%cx
  1c:   b0 0b                   mov    $0xb,%al
  1e:   cd 80                   int    $0x80

Or:

$ ndisasm shellcode 
00000000  90                nop
00000001  90                nop
00000002  90                nop
00000003  90                nop
00000004  90                nop
00000005  90                nop
00000006  90                nop
00000007  90                nop
00000008  90                nop
00000009  31C0              xor ax,ax
0000000B  50                push ax
0000000C  682F2F            push word 0x2f2f
0000000F  7368              jnc 0x79
00000011  682F62            push word 0x622f
00000014  696E89E350        imul bp,[bp-0x77],word 0x50e3
00000019  53                push bx
0000001A  89E1              mov cx,sp
0000001C  B00B              mov al,0xb
0000001E  CD80              int 0x80

This shellcode contains strings which are interpreted as x86 instructions. Is there a way to put proper labels on jumps ?

And is there a way to display strings instead of decoding x86 instructions on strings. I know this is not easy because there is no elf with sections and headers...

Solution

If you had shellcode which used call or jmp to jump over some data, you'd have to replace the strings with NOPs if the disassembler got out of sync while treating the data as instructions, as @DavidJ suggested.

In this case, you're just disassembling in the wrong mode. The jnc is clearly bogus (as I think you realized).

The disassembler is treating the push opcode (the 0x68 byte) as the start of push imm16, because that's how 16-bit mode works. But in 32 and 64-bit modes, the same opcode is the start of a push imm32. So push instruction is actually 5 bytes instead of 3, and the next instruction is actually the next push.

The bogus short-jnc is a huge hint that this is not 16-bit code.

Use ndisasm -b32 or -b64. Ndisasm can read input from stdin, so I used python2 -c 'print "... "' | ndisasm - -b32.

When using objdump, if you prefer Intel syntax, use objdump -d -Mintel. So you could objdump -Mintel -bbinary -D -mi386 /tmp/shellcode for 32-bit (-mi386 selects x86 as the architecture (rather than ARM or MIPS or whatever), and implies -Mi386 32-bit mode as well).

Or for 64-bit, objdump -D -b binary -mi386 -Mx86-64 /tmp/shellcode works. (objdump won't read the binary from stdin :/) Check the objdump man page for more about -M options.

I use this alias in my ~/.bashrc: alias disas='objdump -drwC -Mintel', because I normally disassemble ELF executables / objects to see what a compiler did, not shellcode. You might want -D in your alias.

I'm pretty sure this is 32-bit code, because in 64-bit mode the two pushes would leave a gap. The is no push imm64, but push imm32 is a 64-bit push with the immediate sign-extended to 64 bits. In 64-bit mode, you might use

push  'abcd'
mov   [rsp+4], 'efgh'

to end up with rsp pointing to "abcdefgh".

Also, the use of int 0x80 with a stack address is a big clue this is not 64-bit code. int 0x80 works on Linux in 64-bit mode, but it truncates all inputs to 32-bit: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?

The 32-bit disassembly from ndisasm is:

00000000  90                nop
00000001  90                nop
00000002  90                nop
00000003  90                nop
00000004  90                nop
00000005  90                nop
00000006  90                nop
00000007  90                nop
00000008  90                nop
00000009  31C0              xor eax,eax
0000000B  50                push eax
0000000C  682F2F7368        push dword 0x68732f2f
00000011  682F62696E        push dword 0x6e69622f
00000016  89E3              mov ebx,esp
00000018  50                push eax
00000019  53                push ebx
0000001A  89E1              mov ecx,esp
0000001C  B00B              mov al,0xb
0000001E  CD80              int 0x80
00000020  200A              and [edx],cl

Which looks sane. It contains no branches, but

Is there a way to put proper labels on jumps?

Yes, Agner Fog's objconv disassembler can put labels on branch targets to help you figure out which branch goes where. See How do I disassemble raw x86 code?