Why does the C "long" data type compile to two MSP430 ".word"s?

I understand that:

char (1 byte)
short (2 bytes)
long (4 bytes)
long long (8 bytes)

But when converting C to assembly why is there an extra .word 0 or .word -1?

Solution

The size of a C variable type is specific to the authors choices for that compiler and target. There is no fixed rule by definition. For one (version of) a compiler an int can be 16 bits for one target and 32 for another. For two different compilers same target one can choose 16 bits another 32. And sizes don't have to line up with the general purpose register sizes - author's choice.

This is what stdint.h is all about, it is part of the compiler ultimately and connects the dots between 8, 16, 32, 64, etc sizes and the chosen sizes for the compiler for that target, a specific version of gcc for x86's stdint.h is not expected to be compatible for the same version of gccs msp430 stdint.h for example.

What appears to be going on here is as you described.

char (1 byte)
short (2 bytes)
long (4 bytes)
long long (8 bytes)

Assembly language is specific to the assembler, the tool, not the target, the author of the assembler can choose whatever syntax and mnemonics, etc, they choose. Being somewhat related to the chip documentation is the sane path, but there is certainly no rules for assembly language. In particular how you define data items. It appears here that .word means a 16 bit value here and .byte an 8 bit value.

2048 = 0x0000....00800
-2048 = 0xFFFF....FF800

so if you clip off the lower 8 bits of 2048 you get 0x00, you chop off the lower 16 you get 0x0800, the lower 32 you get 0x00000800, so

.byte 0x00

.word 0x0800

assuming little endian:

.word 0x0800
.word 0x0000

for 8, 16, and 32 bits

In decimal:

.byte 0

.word 2048

.word 2048
.word 0

.word 2048,0

depending on the assembler's syntax

for the negative version -2048

.byte 0x00

.word 0xF800

.word 0xF800
.word 0xFFFF

for 8, 16, and 32 bit versions of that number

in decimal

.byte 0

.word -2048

.word -2048
.word -1

and a long long -2048 would be

.word -2048
.word -1
.word -1
.word -1

or long long -2048 could also be implemented as:

.byte 0
.byte -8
.byte -1
.byte -1
.byte -1
.byte -1
.byte -1
.byte -1

both generating the exact same data in the binary.