I am learning to program in assembly language and I found this code that I can not understand how an instruction is executed
xor eax,eax
xor ebx,ebx
xor ecx,ecx
xor edx,edx
jmp short string
code:
pop ecx
mov bl,1
mov dl,13
mov al,4
int 0x80
dec bl
mov al,1
int 0x80
string:
call code
db 'hello, world!'
After the call to code, why the db instruction it is executed? if a call instruction it is executed before
Just to point out what I meant by "other way defining byte values", this variant of your code will do the same thing, but it shows how to define string by instructions, and how to define instructions by db
directive ... both makes the source harder to read for human, but for the Assembler the difference is negligible, it will produce the same binary machine code, and for CPU the same machine code is the same machine code, it does not care how your source did look.
I also tried to extensively comment each line, what it does, and why it is used in the code.
Also the code is written in this non-trivial way, because it is example of shell-exploit payload, where your assembly must not only do what you want, but its resulting machine code must also conform to additional constraints, like it can't contain any zero (makes it difficult to pass it around as "string" during injecting the payload code with some exploit), it must be PIC (position-independent code), and it can't use any absolute address, or assume any particular position while being executed, etc.
; sets basic registers eax,ebx,ecx,edx to zero (ecx not needed BTW)
xor eax,eax
db '1', 0xDB ; xor ebx,ebx defined by "db" for fun
db '1', 0xC9 ; xor ecx,ecx defined by "db" for fun
xor edx,edx
; short-jump forward to make later "call code" to produce
; negative relative offset, so zero in "call" opcode is avoided
; "call code" from here would need zeroes in rel32 offset encoding
jmp short string ; the "jmp short string" is encoded as "EB 0F"
code:
pop ecx ; loads the address of string from the stack into ecx
mov bl,1 ; ebx = 1 = STD_OUT stream, avoiding zeroes in
; "mov ebx,1" opcode, so instead "xor ebx,ebx mov bl,1" is used
mov dl,13 ; edx = 13 = length of string
mov al,4 ; eax = 4 = sys_write
int 0x80 ; sys_write(STD_OUT, 'hello, world!', 13);
dec bl ; ebx = 0 = exit code "OK"
mov al,1 ; eax = 1 = sys_exit
int 0x80 ; sys_exit(0);
string:
call code ; return address == string address -> pushed on stack
; also "code:" is ahead, so relative offset is negative => no zero in opcode
; resulting call opcode is "E8 EC FF FF FF"
; following bytes are NOT executed as code, they contain string data
push 0x6f6c6c65 ; 'hello'
sub al,0x20 ; ', '
ja short $+0x6f+2 ; 'wo'
jb short $+0x6c+2 ; 'rl'
db 'd!'
To compile I did use nasm -f elf *.asm; ld -m elf_i386 -s -o demo *.o
(ignore warnings), to backwards decompile and check how the actual machine code is forming instructions you can apply objdump -M intel -d demo
.
(the code above and objdump
works also on online site: http://www.tutorialspoint.com/compile_assembly_online.php if you want to test it out)