Why do BSD systems need to sub esp,4 when performing a system call?

I'm performing a system call on OS X (32bit) like this:

push 123
mov eax, 1
sub esp, 4
int 0x80

And I don't quite understand that sub esp, 4 gap.

I read somewhere that BSD and its derivatives always have this gap, but couldn't find an explanation why.

My first thought was stack alignment, but that is not the case, since that line is to be found everywhere, and as far as I know OS X requires 16-byte stack alignment (which isn't the case here either).

Do you have any idea what hides behind the need to do sub esp, 4 or could point me to resources that describe it properly?

Solution

(community wiki because I'm just summarizing comments)

BSD does this to make libc wrapper functions for system calls more efficient, because they can just do the int 0x80 without copying args around. It leaves room for the return address pushed by the CALL to the wrapper function.

It's standard in Unix/Linux system for system calls like read(2) to actually be library wrapper functions around the kernel call, rather than macros that expand to inline-asm.

Linux solves this problem a different way: by passing all syscall args in registers. I guess that means 32-bit wrapper functions have to load all the args from the stack, but at least they don't have to be stored and re-read by the kernel.

The x86-64 system-call ABI is much more compatible with the function calling convention: Only a single mov r10, rcx is needed, because the System V function calling convention passes args in registers (and the syscall registers are chosen to match it as closely as possible, except that the SYSCALL instruction itself destroys RCX and R11, so the kernel can't see the original values.)

See the x86 tag wiki for more info about what the calling conventions actually are, and links to the ABIs.