I'm very curious how assembly languages work- I remain general because I'm not talking only about intel x86 assembly (although it's the only one I'm remotely familiar with). To be a bit more clear...
mov %eax,%ebx
How does the computer know what an instruction like "mov" does? How does it know that eax and ebx are registers? Do people write grammars for assembly languages? How do they write this? I imagine nothing is stopping someone from writing an assembly language that substitutes the mov
instruction with something like dog
or horse
etc., (obviously this isn't semantic at all)
Sorry if this isn't too clear, but it's something I find a bit puzzling- I know it can't be magic, but I can't see how it works. I've looked up some stuff on wikipedia, but all it seems to say is it translates it down to machine code, well, what I'm asking is how that translation occurs I suppose.
Thoughts?
EDIT: I realize that this stuff is defined in reference manuals and things, I guess what I wish to know is how you tell your processor "Okay, when you see mov
you're gonna do this". I also know that it's a sequence of probably a ton of logic gates..but there has to be some way for the processor to recognize is that mov
is the symbol that means "use these logic gates"
What you see there are mnemonics, which make it easy for a programmer to write assembly; it is however not executable in mnemonic form. When you pass these assembly instructions through an assembler, they are translated into machine code they represent, which is what the CPU and its various co-processors interpret and execute (it's generally taken down into smaller units by the CPU, called micro-ops).
If you're curious as to how exactly it does that, well that's a long process, but this has all that information.
All the semantics, etc. are handled by the assembler, which checks for validity and integrity where possible (one can still assemble invalid code however!). This basically makes assembly a low-level language, even though it has a 1 to 1 correlation to the outputted machine code (except when using macro based assemblers, but then the macros still expand to 1 to 1).