I'm new to assembly coding and I have a question about the null character in .data section.
I tested a few codes:
Code 1:
section .data
out: db "%s",10,0
mes1: db "a",0
mes2: db "b",0
section .text
extern printf
global main
main:
push rbp
mov rdi,out
mov rsi,mes1
mov rax,0
call printf
mov rdi,out
mov rsi,mes2
mov rax,0
call printf
pop rbp
mov rax,0
ret
Output is:
a
b
Code 2: changed the .data section to:
section .data
out: db "%s",10 ; no 0
mes1: db "a",0
mes2: db "b",0
Output is:
a
ab
a
Code 3: changed the .data section to:
section .data
out: db "%s",10,0
mes1: db "a"
mes2: db "b"
Output is:
ab
b
So what does the null character do?
I tried to debug it in pwndbg but I didn't get anything interesting.
So what does the null character do?
It informs the service (eg. printf
) about where the end of the string is. But that alone does not explain the different results that you got. A second element to consider is how those strings out, mes1, and mes2 are stored in the memory. It's important to note that they get stored contiguously and that the memory behind the last item contains almost certainly one or more null-bytes.
Code 1:
out: db "%s",10,0
mes1: db "a",0
mes2: db "b",0
null from zero-initialized .data section
v
"%", "s", 10, 0, "a", 0, "b", 0, 0, ...
<---- out ---->
<mes1>
<mes2>
a b
Code 2:
out: db "%s",10
mes1: db "a",0
mes2: db "b",0
null from zero-initialized .data section
v
"%", "s", 10, "a", 0, "b", 0, 0, ...
<------- out ------>
<mes1>
<mes2>
The format string now includes an extra fixed char 'a' behind the newline code.
Output:
a ab a
Code 3:
out: db "%s",10,0
mes1: db "a"
mes2: db "b"
null from zero-initialized .data section
v
"%", "s", 10, 0, "a", "b", 0, ...
<---- out ---->
<-- mes1 ->
<mes2>
The first message got longer by one character and both messages got zero-terminated thanks to the zero-initialization of the .data section.
Output:
ab b