Search code examples
cassemblymsp430

Why does the C "long" data type compile to two MSP430 ".word"s?


I understand that:

char (1 byte)
short (2 bytes)
long (4 bytes)
long long (8 bytes)

But when converting C to assembly why is there an extra .word 0 or .word -1?


Solution

  • The size of a C variable type is specific to the authors choices for that compiler and target. There is no fixed rule by definition. For one (version of) a compiler an int can be 16 bits for one target and 32 for another. For two different compilers same target one can choose 16 bits another 32. And sizes don't have to line up with the general purpose register sizes - author's choice.

    This is what stdint.h is all about, it is part of the compiler ultimately and connects the dots between 8, 16, 32, 64, etc sizes and the chosen sizes for the compiler for that target, a specific version of gcc for x86's stdint.h is not expected to be compatible for the same version of gccs msp430 stdint.h for example.

    What appears to be going on here is as you described.

    char (1 byte)
    short (2 bytes)
    long (4 bytes)
    long long (8 bytes)
    

    Assembly language is specific to the assembler, the tool, not the target, the author of the assembler can choose whatever syntax and mnemonics, etc, they choose. Being somewhat related to the chip documentation is the sane path, but there is certainly no rules for assembly language. In particular how you define data items. It appears here that .word means a 16 bit value here and .byte an 8 bit value.

    2048 = 0x0000....00800
    -2048 = 0xFFFF....FF800
    

    so if you clip off the lower 8 bits of 2048 you get 0x00, you chop off the lower 16 you get 0x0800, the lower 32 you get 0x00000800, so

    .byte 0x00
    
    .word 0x0800
    

    assuming little endian:

    .word 0x0800
    .word 0x0000
    

    for 8, 16, and 32 bits

    In decimal:

    .byte 0
    
    .word 2048
    
    .word 2048
    .word 0
    

    or

    .word 2048,0
    

    depending on the assembler's syntax

    for the negative version -2048

    .byte 0x00
    
    .word 0xF800
    
    .word 0xF800
    .word 0xFFFF
    

    for 8, 16, and 32 bit versions of that number

    in decimal

    .byte 0
    
    .word -2048
    
    .word -2048
    .word -1
    

    and a long long -2048 would be

    .word -2048
    .word -1
    .word -1
    .word -1
    

    or long long -2048 could also be implemented as:

    .byte 0
    .byte -8
    .byte -1
    .byte -1
    .byte -1
    .byte -1
    .byte -1
    .byte -1
    

    both generating the exact same data in the binary.