Search code examples
c++pointersassemblyaddressing-mode

what will be the addressing mode in assembly code generated by the compiler here?


Suppose we've got two integer and character variables:

int adad=12345;
char character;

Assuming we're discussing a platform in which, length of an integer variable is longer than or equal to three bytes, I want to access third byte of this integer and put it in the character variable, with that said I'd write it like this:

character=*((char *)(&adad)+2);

Considering that line of code and the fact that I'm not a compiler or assembly expert, I know a little about addressing modes in assembly and I'm wondering the address of the third byte (or I guess it's better to say offset of the third byte) here would be within the instructions generated by that line of code themselves, or it'd be in a separate variable whose address (or offset) is within those instructions ?


Solution

  • The best thing to do in situations like this is to try it. Here's an example program:

    int main(int argc, char **argv)
    {
      int adad=12345;
      volatile char character;
    
      character=*((char *)(&adad)+2);
    
      return 0;
    }
    

    I added the volatile to avoid the assignment line being completely optimized away. Now, here's what the compiler came up with (for -Oz on my Mac):

    _main:
        pushq   %rbp
        movq    %rsp,%rbp
        movl    $0x00003039,0xf8(%rbp)
        movb    0xfa(%rbp),%al
        movb    %al,0xff(%rbp)
        xorl    %eax,%eax
        leave
        ret
    

    The only three lines that we care about are:

        movl    $0x00003039,0xf8(%rbp)
        movb    0xfa(%rbp),%al
        movb    %al,0xff(%rbp)
    

    The movl is the initialization of adad. Then, as you can see, it reads out the 3rd byte of adad, and stores it back into memory (the volatile is forcing that store back).

    I guess a good question is why does it matter to you what assembly gets generated? For example, just by changing my optimization flag to -O0, the assembly output for the interesting part of the code is:

        movl    $0x00003039,0xf8(%rbp)
        leaq    0xf8(%rbp),%rax
        addq    $0x02,%rax
        movzbl  (%rax),%eax
        movb    %al,0xff(%rbp)
    

    Which is pretty straightforwardly seen as the exact logical operations of your code:

    1. Initialize adad
    2. Take the address of adad
    3. Add 2 to that address
    4. Load one byte by dereferencing the new address
    5. Store one byte into character

    Various optimizations will change the output... if you really need some specific behaviour/addressing mode for some reason, you might have to write the assembly yourself.