What does call _start in x86?

There is a c runtime library, that according to https://en.wikipedia.org/wiki/Crt0 is in file ctr0.o called to initialize variables before calling main. I have copied it here :

.text
    .globl _start
str : .asciz "abcd\n"
_start:
    xor %ebp, %ebp #basePointer == 0
    mov (%rsp), %edi #argc from stack
    lea 8(%rsp), %rsi #pointer to argv
    lea 16(%rsp,%rdi,8), %rdx #pointer to envp
    xor %eax, %eax
    call main
    mov %eax, %edi
    xor %eax, %eax
    call _exit

main:
    lea str(%rip), %rdi
    call puts

I have some question regarding the implementation:

What is in stack before called _start which should be the only entry for linker? I am asking becuase there are expression such as mov (%rsp), %edi #argc from stack, where the _start is getting value from the stack, but _start should not have any argc (only main does) nor argv and envp. All these arguments are part of main function, not _start entry point. So what is in stack before _start?
This should be designed to provide initilization of variables from .data or .bss segments, but I do not see such initialization of them here. It could be related with the stack, but I do not know how. Before the variables are initialized (which should be in the ctr0.o, here), the hold initial value and linker reserve space for them (also from that link). In what section of memory type, does gcc hold space for those not-initialized variables?

Finally, how to compile this assembly, without stdlib, but requires some of its function (puts, _exit) in order to work? I have tried cc -nostdlib foo.s but

/usr/bin/ld: /tmp/ccSKxoPY.o: in function `_start':
(.text+0x21): undefined reference to `_exit'
/usr/bin/ld: /tmp/ccSKxoPY.o: in function `main':
(.text+0x2d): undefined reference to `puts'
collect2: error: ld returned 1 exit status

(Cannot use stdlib otherwise, there would be 2 declaration of _start entrypoint).

Solution

First of all, when using the same CPU (e.g. an x86-64 CPU), you need different crt0.S files for different operating systems.

And you need a different crt0.S for programs that are not started using an operating system (such as an operating system itself).

What is in stack before called _start which should be the only entry for linker?

This depends on the operating system. Linux would copy argc, the arguments (argv[n]) and the environment (environ[n]) somewhere on the stack.

The file from your question is intended for an operating system that places argc at rsp+0, followed by the arguments and the environment.

However, I remember a (32-bit) OS that put argc at esp+0x80 instead of esp+0, so this is also possible...

As far as I know, Windows does not put anything on the stack (at least not officially). The corresponding crt0.S code must call a function in a DLL file to get the command line arguments.

In the case of a device firmware which is started immediately after the CPU (microcontroller) start, the crt0.S code must even set the stack pointer to a valid value first. The memory (including the stack) is often completely uninitialized in this case.

Needless to say that the stack does not contain any useful values in this case.

This should be designed to provide initilization of variables from .data ...

In the case of a software started by an operating system, the operating system will initialize the .data section. This means that the crt0.S code does not have to do that.

In the case of a microcontroller program (device firmware), the crt0.S code has to do this.

Because your file is obviously intended for an operating system, it does not initialize the .data section.

Finally, how to compile this assembly, without stdlib ...

If you want to use the crt0.S file from your question, you'll definitely require the _exit() function.

And if you want to use the function puts() in your code, you'll also need this function.

If you don't use the standard library, you'll have to write these functions yourself:

    ...
main:
    lea str(%rip), %rdi
    call puts
    ret

_exit:
    ...

puts:
    ...

The exact implementation depends on the operating system you use.

puts() will be a bit tricky to implement; write() would be easier.

Note:

Please also don't forget the ret at the end of the main() function; (alternatively you can jmp to puts() instead of calling it...)