I am trying to study about assembly, compiler(LLVM) and lifter.
I can write just assembly code by nasm.(like this)
Below is my assembly code.
section .data
hello_string db "Hello World!", 0x0d, 0x0a
hello_string_len equ $ - hello_string
section .text
global _start
_start:
mov eax, 4 ; eax <- 4, syscall number (print) But, never execute.
mov ebx, 1 ; ebx <- 1, syscall argument1 (stdout) But, never execute.
mov ecx, hello_string ; ecx <- exit_string, syscall argument2 (string ptr) But, never execute.
mov edx, hello_string_len ; edx <- exit_string_len, syscall argument3 (string len) But, never execute.
int 0x80; ; syscall But, never execute.
mov eax, 1 ; eax <- 1, syscall number (exit) But, never execute.
mov ebx, 0 ; ebx <- 0, syscall argument1 (return value) But, never execute.
int 0x80; syscall But, never execute.
;nasm -felf32 hello.x86.s -o hello.o
;ld -m elf_i386 hello.o -o hello.out
And I check binary file.
Here, I can't find Function. and i agree with that call and ret instructions are something combined some instructions.
$readelf -s hello.o
Symbol table '.symtab' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FILE LOCAL DEFAULT ABS hello.x86.s
2: 00000000 0 SECTION LOCAL DEFAULT 1
3: 00000000 0 SECTION LOCAL DEFAULT 2
4: 00000000 0 NOTYPE LOCAL DEFAULT 1 hello_string
5: 0000000e 0 NOTYPE LOCAL DEFAULT ABS hello_string_len
6: 00000000 0 NOTYPE GLOBAL DEFAULT 2 _start
But. If i compile c program and check that binary file by readelf. then i can find "function".
P.S
$readelf -s function.o | grep FUNC
3: 0000000000000000 18 FUNC GLOBAL DEFAULT 2 add
4: 0000000000000020 43 FUNC GLOBAL DEFAULT 2 main
here i can see what is function.
what is function different NOTYPE label?
ELF symbol metadata can be set by some assemblers, e.g. in NASM, global main:function
to mark the symbol type as FUNC. (https://nasm.us/doc/nasmdoc8.html#section-8.9.5).
The GAS syntax equivalent (which C compilers emit) is .type main, function
. e.g. put some code on https://godbolt.org and disable filtering to see asm directives in compiler output.
But note this is just metadata for linkers and debuggers to use; the CPU doesn't see that when executing. That's why nobody bothers with it for NASM examples.
Assembly language doesn't truly have functions, just the tools to implement that concept, e.g. jump and store a return address somewhere = call
, indirect jump to a return address = ret
. On x86, return addresses are pushed and popped on the stack.
The model of execution is purely sequential and local, one instruction at a time (on most ISAs, but some ISAs are VLIW and execute 3 at a time for example, but still local in scope), with each instruction just making a well-defined change to the architectural state. The CPU itself doesn't know or care that it's "in a function" or anything about nesting, other than the return-address predictor stack which optimistically assumes that ret
will actually use a return address pushed by a corresponding call
. But that's a performance optimization; you do sometimes get mismatched call/ret if code is doing something weird (e.g. a context switch).
A C compiler won't put any instructions outside of functions.
Technically the _start
entry point that indirectly calls main
isn't a function; it can't return and has to make an exit
system call, but that's written in asm and is part of libc. It's not generated by the C compiler proper, only linked with the C compiler's output to make a working program.) See Linux x86 Program Start Up
or - How the heck do we get to main()? for example.