How does the stack differentiate between different number types?

I am trying to learn assembly and I’m having some trouble understanding memory allocation/retrieval on the stack.

When strings are allocated on the stack, the program knows to stop reading the string when it reaches a null terminating character /x00. With numbers however, there is no such thing. How does the program know the end of a number allocated on the stack, and how does it differentiate between different number types (short, long, int)? (I’m a bit new to this so please correct me on anything I may be misunderstanding!)

Solution

Type (int vs. float vs. char * vs. struct foo) only really matters during translation, when the compiler is analyzing your source code and converting it to the appropriate machine code. That's when rules like "one of the operands of [] shall have pointer type and the other shall have integer type" and "the operand of unary * shall have pointer type" and "the operands of multiplicative operators shall have arithmetic type", etc., are enforced.

Assembly languages typically deal with bytes, words (2 bytes), longwords (4 bytes), etc., although some special-purpose platforms may have weird word sizes. The opcode addb¹ adds the contents of two byte-sized entities, addl adds the contents of two longword-sized entities, etc. So when the compiler is translating your code, it uses the right opcodes for the object based on its declared type. So if you declare something as a short, the compiler will (typically) use opcodes intended for word-sized objects (addw, movw, etc.). If you declare something as int or long, it will (typically) use opcodes intended for longword-sized objects (addl, movl). Floating-point types are often handled with a different set of opcodes and their own set of registers.

In short, the assembly language "knows" where and how big things are by virtue of the opcodes the compiler specified.

Simple example - here's some C source code that works with an int and a short:

#include <stdio.h>

int main( void )
{
  int x;
  short y;

  printf( "Gimme an x: " );
  scanf( "%d", &x );

  y = 2 * x + 30;

  printf( "x = %d, y = %hd\n", x, y );
  return 0;
}

I used the -Wa,-aldh option with gcc to generate a listing of the assembly code with the source code interleaved, giving me

GAS LISTING /tmp/cc3D25hf.s             page 1


   1                    .file   "simple.c"
   2                    .text
   3                .Ltext0:
   4                    .section    .rodata
   5                .LC0:
   6 0000 47696D6D      .string "Gimme an x: "
   6      6520616E 
   6      20783A20 
   6      00
   7                .LC1:
   8 000d 256400        .string "%d"
   9                .LC2:
  10 0010 78203D20      .string "x = %d, y = %hd\n"
  10      25642C20 
  10      79203D20 
  10      2568640A 
  10      00
  11                    .text
  12                    .globl  main
  14                main:
  15                .LFB0:
  16                    .file 1 "simple.c"
   1:simple.c      **** #include <stdio.h>
   2:simple.c      **** 
   3:simple.c      **** int main( void )
   4:simple.c      **** {
  17                    .loc 1 4 0
  18                    .cfi_startproc
  19 0000 55            pushq   %rbp
  20                    .cfi_def_cfa_offset 16
  21                    .cfi_offset 6, -16
  22 0001 4889E5        movq    %rsp, %rbp
  23                    .cfi_def_cfa_register 6
  24 0004 4883EC10      subq    $16, %rsp
   5:simple.c      ****   int x;
   6:simple.c      ****   short y;
   7:simple.c      **** 
   8:simple.c      ****   printf( "Gimme an x: " );
  25                    .loc 1 8 0
  26 0008 BF000000      movl    $.LC0, %edi
  26      00
  27 000d B8000000      movl    $0, %eax
  27      00
  28 0012 E8000000      call    printf
  28      00
   9:simple.c      ****   scanf( "%d", &x );
  29                    .loc 1 9 0
  30 0017 488D45F8      leaq    -8(%rbp), %rax
  31 001b 4889C6        movq    %rax, %rsi
  32 001e BF000000      movl    $.LC1, %edi
  32      00
  33 0023 B8000000      movl    $0, %eax
  33      00
  34 0028 E8000000      call    __isoc99_scanf
  34      00
  10:simple.c      **** 
  11:simple.c      ****   y = 2 * x + 30;

GAS LISTING /tmp/cc3D25hf.s             page 2


  35                    .loc 1 11 0
  36 002d 8B45F8        movl    -8(%rbp), %eax
  37 0030 83C00F        addl    $15, %eax
  38 0033 01C0          addl    %eax, %eax
  39 0035 668945FE      movw    %ax, -2(%rbp)
  12:simple.c      **** 
  13:simple.c      ****   printf( "x = %d, y = %hd\n", x, y );
  40                    .loc 1 13 0
  41 0039 0FBF55FE      movswl  -2(%rbp), %edx
  42 003d 8B45F8        movl    -8(%rbp), %eax
  43 0040 89C6          movl    %eax, %esi
  44 0042 BF000000      movl    $.LC2, %edi
  44      00
  45 0047 B8000000      movl    $0, %eax
  45      00
  46 004c E8000000      call    printf
  46      00
  14:simple.c      ****   return 0;
  47                    .loc 1 14 0
  48 0051 B8000000      movl    $0, %eax
  48      00
  15:simple.c      **** }
  49                    .loc 1 15 0
  50 0056 C9            leave
  51                    .cfi_def_cfa 7, 8
  52 0057 C3            ret
  53                    .cfi_endproc
  54                .LFE0:
  56                .Letext0:
  57                    .file 2 "/usr/lib/gcc/x86_64-redhat-linux/7/include/stddef.h"
  58                    .file 3 "/usr/include/bits/types.h"
  59                    .file 4 "/usr/include/libio.h"
  60                    .file 5 "/usr/include/stdio.h"

If you look at the lines

  36 002d 8B45F8        movl    -8(%rbp), %eax
  37 0030 83C00F        addl    $15, %eax
  38 0033 01C0          addl    %eax, %eax
  39 0035 668945FE      movw    %ax, -2(%rbp)

that's the machine code for

y = 2 * x + 30;

When it's dealing with x, it uses opcodes for longwords:

movl    -8(%rbp), %eax ;; copy the value in x to the eax register
addl    $15, %eax      ;; add the literal value 15 to the value in eax
addl    %eax, %eax     ;; multiply the value in eax by 2

When it's dealing with y, it uses opcodes for words:

movw    %ax, -2(%rbp)  ;; save the value in the lower 2 bytes of eax to y

So that's how it "knows" how many bytes to read for a given object - all that information is baked into the machine code itself. Scalar types all have fixed, known sizes, so it's just a matter of picking the correct opcode or opcodes to use.

^{I'm using Intel-specific mnemonics, but the concept is the same for other assemblers.}