Search code examples
assemblycpu-architectureportabilityplatform-independent

How to start learning assembly language on any system


I want to learn assembly but I am not sure about this but because assembly is low level programming language the code written for printing Hello will be different on windows machine then on mac.

How can I get around this problem (if this problem exists at all) and where can I start learning the actual language.


Solution

  • First you'll want a good grasp of programming, in a language like C as it is about as low level as it gets before assembly — other languages are great, but hide many more of the low level details.

    C has pointers, which are commonly used in assembly language, so in C we can have an array version of some algorithm and also a pointer version of the same.  It is good to know/understand these things before learning assembly.

    Also, you'll need a good grasp of debugging — stepping line by line to watch your program run; observing variables change; observing control flow; breaking apart complex statements into simpler statements so you can watch what's going on inside them.  Debugging skills are a requirement for programming, and even more so in assembly.


    High level languages offer:

    • variables that are named, have scope, lifetime/duration, are typed, and can hold values according to their types
    • structured-statements/control-structures that nest easily
    • expressions written in an easy familiar (mathematical) notation

    By contrast, the assembly/machine-code offers:

    • physical storage that simply exists, has no scope, lifetime/duration, is untyped, etc..
      • CPU registers
      • main memory / RAM
    • if-goto-label style for control structures
    • instructions to manipulate the storage and compute

    These features of assembly are in common across all processors.


    To learn assembly it is good to be able to relate high level language constructs to the capabilities of the processor.  One way of doing this is to try to translate small programs written in C or pseudo code into assembly.

    Especially when learning assembly language, it is always a good idea to know what you're trying do, and that means having/writing an algorithm first, and it is best if the algorithm can be tested so it is known to work, as small design changes in C can sometimes result in major changes (e.g. rewrite) in assembly.  One way is to do that on paper using pseudo code, though I recommend writing in a high level language, C preferred, so that you can actually run & test your algorithm.


    To translate an algorithm into assembly:

    1. translate data types into physical storage concepts, accounting for sizes, offsets, and alignments
    2. translate global variables into physical storage reservations
    3. translate functions into assembly:
      1. translate the parameters and local variables into physical storage, accounting for usage, lifetimes, size, and type, as well as overlap with other variables.
      2. translate structured control statements into the equivalent patterns in if-goto-label
      3. translate the expressions into machine code instructions[1]

    The above discussions should give some idea of what is in common between all assembly languages.  Learning one assembly language means understanding the above topics, plus learning the actual instruction set of some specific processor.  Much of what you learn for one processor will transfer to another, especially if you can separate the above broad/common concepts from the specifics of any given instruction set.

    Instruction sets vary in terms of the number of registers available, the ways that conditional (if-goto-label) branches are performed, sizes of immediate operands, number of operands allowed for binary operators: two vs. three, how memory is accessed, many other details.  As others are saying in comment to your question post, even for the same hardware, there are likely differences in how registers are used and parameters passed for different operating systems.

    To start learning assembly languages, I'd suggest to start by choosing one of the simple processors to learn the concepts of physical storage, control structure patterns, expression evaluation, and function calling.  Maybe instruction encoding as well, especially if your interests lean toward processor internals.

    Fairly simple yet real & modern: RISC V, which is very similar to MIPS, and both have good PC simulators as well as lots of online materials for study.  Further those processors, being real, have compilers available which can translate C code into assembly for you to inspect.

    Even simpler is LC-3 — a very basic, easy to learn, educationally-oriented (toy) processor with good simulator support.  Downside is lack of real compiler support, but upside is simplicity — its limitations keep this processor very simple and digestible.

    x86 is a very common processor though it is saddled with decades of baggage, much of which makes proper sense when you understand the history and evolution; otherwise arguably overly complicated to learn from scratch.


    [1] In expressions and/or statements of the high-level language, you will sometimes encounter function and/or procedure calls.  Function calling is a rather advanced concept in assembly language as there are a lot of piece parts involved.  Parameters need to be evaluated and passed to the called function, the current function's context has to be set up to be preserved and suspended, as control is transferred to the called function, who eventually returns to the caller so as to resume.  Preserving the current function's context in order to be able to resume later upon the return of the called function generally involves liveness analysis of variables & temporaries that are live before the function to be called, and also used after.  Once such analysis is known, we can understand how the current function (caller) can use registers and memory on the call stack, and this influences function prologue and epilogue.