I'm trying to learn a bit about assembly. I decided to start by looking at the generated assembly files from simple source code. Of course, I get bombarded by instructions that I have no idea what they mean, and I start to search for their meaning on the internet. While searching, I realized that I have no idea what assembly language I'm looking for..
Is there a way to know which assembly language gcc generates? Does this question even make sense? I am mainly interested in the assembly that my system accepts (or however I should phrase that..). See below for the generated code using gcc.
If you realize which knowledge gaps I have, please link the relevant documents to read/study.
System:
OS: Windows 10 Pro
Processor: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz 2.20 GHz
Type: 64-bit operating system, x64-based processor
//test.c
int main(){
int x = 2;
return 0;
}
//test.s
.file "test.c"
.text
.def __main; .scl 2; .type 32; .endef
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
pushq %rbp
.seh_pushreg %rbp
movq %rsp, %rbp
.seh_setframe %rbp, 0
subq $48, %rsp
.seh_stackalloc 48
.seh_endprologue
call __main
movl $2, -4(%rbp)
movl $0, %eax
addq $48, %rsp
popq %rbp
ret
.seh_endproc
.ident "GCC: (Rev10, Built by MSYS2 project) 10.2.0"
GCC always produces asm output that the GNU assembler can assemble, on any platform. (GAS / GNU as
is part of GNU Binutils, along with tools like ld
, a linker.)
In your case, the target is x86-64 Windows (prob. from x86_64-w64-mingw32-gcc),
and the instruction syntax is AT&T syntax (GCC and GAS default for x86 including x86-64).
The comment character is #
in GAS for x86 (including x86-64).
Anything starting with a .
is a directive; some, like .globl main
to export the symbol main
as visible in the .o
for linking, are universal to GAS in general; check the GAS manual.
SEH directives like .seh_setframe %rbp, 0
are Windows-specific stack-unwind metadata for Structured Exception Handling, specific to Windows object-file formats. (Which you can 100% ignore, until/unless you want to learn how backtraces and exception handling work under the hood, without relying on a chain of legacy frame pointers. AFAIK, it's basically equivalent to ELF/Linux .eh_frame
metadata from .cfi
directives.)
In fact you can ignore almost all the directives, with the only really important ones being sections like .text
vs. .data
, and somewhat important to make linking work being .globl
. That's why https://godbolt.org/ filters directives by default.
You can use gcc -masm=intel
if you want Intel syntax / mnemonics which you can look up in Intel's manuals. (https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html / https://www.felixcloutier.com/x86/). See also How to remove "noise" from GCC/clang assembly output?. (gcc -O1 -fverbose-asm
might be interesting.)
If you want to learn AT&T syntax, see https://stackoverflow.com/tags/att/info. The GAS manual also has a page about AT&T vs. Intel syntax, but it's not written as a tutorial, i.e. it assumes you know how x86 instructions work, and are looking for details on the syntax GAS uses to describe them: https://sourceware.org/binutils/docs/as/i386_002dVariations.html
(Keep in mind that the CPU actually runs machine code, and it doesn't matter how the bytes get into memory, just that they do. So different assemblers (like NASM vs. GAS) and different syntaxes (like .intel_syntax noprefix
) ultimately have the same limitations on what the machine can do or not in one instruction. All mainstream assemblers can let you express pretty much everything every instruction can do, it's just a matter of knowing the syntax for immediates, addressing modes, and so on. Intel and AMD's manuals document exactly what the CPU can do, using Intel syntax but not nailing down the details of syntax or directives.)
Resources (including some linked above):