Let's say I have the following assembly program:
.globl _start
_start:
mov $1, %eax
int $0x80
And I assemble/link it with:
$ as file.s
$ ld a.out -o a
This will run fine, and return the status code of 0 to linux. However, when I remove the line .globl start
I get the following error:
ld
: warning: cannot find entry symbol_start
; defaulting to0000000000400078
What does 0000000000400078
mean? And also, if ld
expects the _start
symbol on entry, why is it even necessary to declare .globl _start
?
However, when I remove the line
.globl _start
...
The .globl
line means that the name _start
is "visible" outside the file file.s
. If you remove that line, the name _start
is only for use inside the file file.s
and in a larger program (containing multiple files) you could even use the name _start
in multiple files.
(This is similar to static
variables in C/C++: If you generate assembler code from C or C++, the difference between real global variables and static
variables is that there is a .globl
line for the global variables and no .globl
line for static
variables. And if you are familiar with C, you know that static
variables cannot be used in other files.)
The linker (ld
) is also not able to use the name _start
if it can be used inside the file only.
What does
0000000000400078
mean?
Obviously 0x400078
is the address of the first byte of your program. ld
assumes that the program starts at the first byte if no symbol named _start
is found.
... why is it even necessary to declare
.globl _start
?
It is not guaranteed that _start
is located at the first byte of your program.
Counterexample:
.globl _start
write_stdout:
mov $4, %eax
mov $1, %ebx
int $0x80
ret
exit:
mov $1, %eax
mov $0, %ebx
int $0x80
jmp exit
_start:
mov $text, %ecx
mov $(textend-text), %edx
call write_stdout
mov $text2, %ecx
mov $(textend2-text2), %edx
call write_stdout
call exit
text:
.ascii "Hello\n"
textend:
text2:
.ascii "World\n"
textend2:
If you remove the .globl
line, ld
will not be able to find the _start:
line and assume that your program starts at the first byte - which is the write_stdout:
line!
... and if you have multiple .s
files in a larger program (or even a combination of .s
, .c
and .cc
), you don't have control about which code is located at the first byte of your program!