I mean something that I write in NASM like this:
mov dword [0xA0BF17C], ' : )'
I have tried such a things in GNU assembler:
movd " : )", 0xB8000
movd $" : )", 0xB8000
movd ' : )', 0xB8000
movd " : )", $0xB8000
But... They all caused this error:
Error: unbalanced parenthesis in operand 1.
GAS only supports single-character literals as numbers. UTF-8 multi-byte single character is ok, but not multiple separate characters. You could do movb $' ', 0xB8000
, but you don't want to use 4 instructions for 4 bytes.
You have two real options: shift together single-character literals into a number, or write it out in hex. (Both ways taking into account that x86 is little-endian)
# NASM mov eax, "abcd"
movl $'a' + ('b'<<8) + ('c'<<16) + ('d'<<24), 0xB8000
movl $0x64636261, 0xB8000 # or manual ASCII -> hex, little-endian
The shift/add trick works with any arbitrary bytes; you could maybe even make a #define
CPP macro to do it (taking 4 args).
With an EAX destination instead of memory (to simplify the machine code), disassembled back into GAS Intel syntax (objdump -drwC -Mintel
), we can see they both assembled identically (with as --32
):
0: b8 61 62 63 64 mov eax,0x64636261
5: b8 61 62 63 64 mov eax,0x64636261
Or with your memory destination. Again, 32-bit mode since this would #GP fault in real mode from exceeding the 64k DS segment limit with that 0xb8000 offset.
Also notice that the immediate bytes in the machine code are in the same order that will be stored as data to the memory destination. (And they match source order if you were using NASM mov dst, "abcd"
.
a: c7 05 00 80 0b 00 61 62 63 64 mov DWORD PTR ds:0xb8000,0x64636261
Unlike NASM, GAS doesn't support multi-character character literals as numeric constants. It so doesn't support them that they even confuse GAS's parser1! GAS was mostly designed for assembling compiler output, and compilers don't need this.
GAS only supports (double) quoted strings of multiple characters as args to .ascii
/ .asciz
/ .string8/16/32
, not to .byte
(unlike NASM db
) or as an immediate operand for an instruction.
If it was supported, the x86 AT&T syntax would be movl $' : )', 0xB8000
.
Not movd
, and an immediate operand always needs a $
.
See When using the MOV mnemonic to load/copy a string to a memory register in MASM, are the characters stored in reverse order? for NASM vs. MASM vs. GAS with multi-character literals. Only NASM works intuitively.
Double quotes don't work either: mov $"foo", %eax
assembles, but it assembles the same as mov $foo, %eax
- putting the address of the symbol foo
into a register. See relocation R_X86_64_8 against undefined symbol `ELF' can not be used when making a PIE object for an example of that.
Footnote 1: Hence errors like "unbalanced parenthesis" instead of something sensible like "character literal contains multiple characters".
mov $'abcd', %eax
is another example of totally confusing the parser. It sees the b
as a backward symbol reference for local labels, like jmp 1b
to reference a 1:
label in the backwards direction. But the label number it's looking for here is 97, the ASCII value of 'a'
. This is totally bonkers
foo.s: Assembler messages:
foo.s:4: Error: backward ref to unknown label "97:"
foo.s:4: Error: junk `cd44%eax' after expression
foo.s:4: Error: number of operands mismatch for `mov'
All of this was tested with as --version
= GNU assembler (GNU Binutils) 2.34.