Search code examples
cassemblytypesintegerstack-memory

How does the stack differentiate between different number types?


I am trying to learn assembly and I’m having some trouble understanding memory allocation/retrieval on the stack.

When strings are allocated on the stack, the program knows to stop reading the string when it reaches a null terminating character /x00. With numbers however, there is no such thing. How does the program know the end of a number allocated on the stack, and how does it differentiate between different number types (short, long, int)? (I’m a bit new to this so please correct me on anything I may be misunderstanding!)


Solution

  • Type (int vs. float vs. char * vs. struct foo) only really matters during translation, when the compiler is analyzing your source code and converting it to the appropriate machine code. That's when rules like "one of the operands of [] shall have pointer type and the other shall have integer type" and "the operand of unary * shall have pointer type" and "the operands of multiplicative operators shall have arithmetic type", etc., are enforced.

    Assembly languages typically deal with bytes, words (2 bytes), longwords (4 bytes), etc., although some special-purpose platforms may have weird word sizes. The opcode addb1 adds the contents of two byte-sized entities, addl adds the contents of two longword-sized entities, etc. So when the compiler is translating your code, it uses the right opcodes for the object based on its declared type. So if you declare something as a short, the compiler will (typically) use opcodes intended for word-sized objects (addw, movw, etc.). If you declare something as int or long, it will (typically) use opcodes intended for longword-sized objects (addl, movl). Floating-point types are often handled with a different set of opcodes and their own set of registers.

    In short, the assembly language "knows" where and how big things are by virtue of the opcodes the compiler specified.

    Simple example - here's some C source code that works with an int and a short:

    #include <stdio.h>
    
    int main( void )
    {
      int x;
      short y;
    
      printf( "Gimme an x: " );
      scanf( "%d", &x );
    
      y = 2 * x + 30;
    
      printf( "x = %d, y = %hd\n", x, y );
      return 0;
    }
    

    I used the -Wa,-aldh option with gcc to generate a listing of the assembly code with the source code interleaved, giving me

    GAS LISTING /tmp/cc3D25hf.s             page 1
    
    
       1                    .file   "simple.c"
       2                    .text
       3                .Ltext0:
       4                    .section    .rodata
       5                .LC0:
       6 0000 47696D6D      .string "Gimme an x: "
       6      6520616E 
       6      20783A20 
       6      00
       7                .LC1:
       8 000d 256400        .string "%d"
       9                .LC2:
      10 0010 78203D20      .string "x = %d, y = %hd\n"
      10      25642C20 
      10      79203D20 
      10      2568640A 
      10      00
      11                    .text
      12                    .globl  main
      14                main:
      15                .LFB0:
      16                    .file 1 "simple.c"
       1:simple.c      **** #include <stdio.h>
       2:simple.c      **** 
       3:simple.c      **** int main( void )
       4:simple.c      **** {
      17                    .loc 1 4 0
      18                    .cfi_startproc
      19 0000 55            pushq   %rbp
      20                    .cfi_def_cfa_offset 16
      21                    .cfi_offset 6, -16
      22 0001 4889E5        movq    %rsp, %rbp
      23                    .cfi_def_cfa_register 6
      24 0004 4883EC10      subq    $16, %rsp
       5:simple.c      ****   int x;
       6:simple.c      ****   short y;
       7:simple.c      **** 
       8:simple.c      ****   printf( "Gimme an x: " );
      25                    .loc 1 8 0
      26 0008 BF000000      movl    $.LC0, %edi
      26      00
      27 000d B8000000      movl    $0, %eax
      27      00
      28 0012 E8000000      call    printf
      28      00
       9:simple.c      ****   scanf( "%d", &x );
      29                    .loc 1 9 0
      30 0017 488D45F8      leaq    -8(%rbp), %rax
      31 001b 4889C6        movq    %rax, %rsi
      32 001e BF000000      movl    $.LC1, %edi
      32      00
      33 0023 B8000000      movl    $0, %eax
      33      00
      34 0028 E8000000      call    __isoc99_scanf
      34      00
      10:simple.c      **** 
      11:simple.c      ****   y = 2 * x + 30;
    
    GAS LISTING /tmp/cc3D25hf.s             page 2
    
    
      35                    .loc 1 11 0
      36 002d 8B45F8        movl    -8(%rbp), %eax
      37 0030 83C00F        addl    $15, %eax
      38 0033 01C0          addl    %eax, %eax
      39 0035 668945FE      movw    %ax, -2(%rbp)
      12:simple.c      **** 
      13:simple.c      ****   printf( "x = %d, y = %hd\n", x, y );
      40                    .loc 1 13 0
      41 0039 0FBF55FE      movswl  -2(%rbp), %edx
      42 003d 8B45F8        movl    -8(%rbp), %eax
      43 0040 89C6          movl    %eax, %esi
      44 0042 BF000000      movl    $.LC2, %edi
      44      00
      45 0047 B8000000      movl    $0, %eax
      45      00
      46 004c E8000000      call    printf
      46      00
      14:simple.c      ****   return 0;
      47                    .loc 1 14 0
      48 0051 B8000000      movl    $0, %eax
      48      00
      15:simple.c      **** }
      49                    .loc 1 15 0
      50 0056 C9            leave
      51                    .cfi_def_cfa 7, 8
      52 0057 C3            ret
      53                    .cfi_endproc
      54                .LFE0:
      56                .Letext0:
      57                    .file 2 "/usr/lib/gcc/x86_64-redhat-linux/7/include/stddef.h"
      58                    .file 3 "/usr/include/bits/types.h"
      59                    .file 4 "/usr/include/libio.h"
      60                    .file 5 "/usr/include/stdio.h"
    

    If you look at the lines

      36 002d 8B45F8        movl    -8(%rbp), %eax
      37 0030 83C00F        addl    $15, %eax
      38 0033 01C0          addl    %eax, %eax
      39 0035 668945FE      movw    %ax, -2(%rbp)
    

    that's the machine code for

    y = 2 * x + 30;
    

    When it's dealing with x, it uses opcodes for longwords:

    movl    -8(%rbp), %eax ;; copy the value in x to the eax register
    addl    $15, %eax      ;; add the literal value 15 to the value in eax
    addl    %eax, %eax     ;; multiply the value in eax by 2
    

    When it's dealing with y, it uses opcodes for words:

    movw    %ax, -2(%rbp)  ;; save the value in the lower 2 bytes of eax to y
    

    So that's how it "knows" how many bytes to read for a given object - all that information is baked into the machine code itself. Scalar types all have fixed, known sizes, so it's just a matter of picking the correct opcode or opcodes to use.


    1. I'm using Intel-specific mnemonics, but the concept is the same for other assemblers.