When I try and assemble the program, I get a bunch of the following error messages:
misha@hp-laptop:~/test$ as -gstabs test.s -o test.o && ld test.o -o a.out && rm test.o && ./a.out
test.s: Assembler messages:
test.s:19: Error: junk `(0,0,1)' after expression
test.s:20: Error: junk `(0,1,1)' after expression
test.s:21: Error: junk `(0,2,1)' after expression
test.s:22: Error: junk `(0,3,1)' after expression
Can anybody please tell me what exactly I'm doing wrong that my program won't run? Obviously, it's something that has to do with the way I'm trying to get access to the array elements each of which is one byte long. Here's the program itself:
/******************************************************************************
* *
* This program prints the string "foo" on the console. *
* *
******************************************************************************/
.section .data
array: .byte 0x00, 0x00, 0x00, 0x00 # An array of four bytes
size: .int 4 # The size of the array
.section .text
.globl _start
_start:
movb $0x66, %ah # 66 is the hexadecimal value for 'f'
movb $0x6F, %al # 6F is the hexadecimal value for 'o'
movb $0x6F, %bh # 6F is the hexadecimal value for 'o'
movb $0x0A, %bl # A is the hexadecimal value for '\n'
movb %ah, array(0, 0, 1)
movb %al, array(0, 1, 1)
movb %bh, array(0, 2, 1)
movb %bl, array(0, 3, 1)
# print
movl $4, %eax # 4 is the number for the write system call
movl $1, %ebx # The file descriptor to write to (1 - STDOUT)
movl $array, %ecx # The starting address of the string to print
movl size, %edx # The number of bytes to print
int $0x80 # Wake up the kernel to run the write system call
# exit
movl $1, %eax # 1 is the number for the exit system call
movl $0, %ebx # Exit status code (echo $?)
int $0x80 # Wake up the kernel to run the exit system call
/*
Compile and run:
as -gstabs test.s -o test.o && \
ld test.o -o a.out && \
rm test.o && \
./a.out
*/
There's no asm syntax for multi-dimensional arrays, unless you build it yourself with macros. Or maybe you come up with that by replacing unused registers with 0
in (base, index, scale)
syntax.
What you can do is use an expression involving a label to get offsets from it, like movb $constant, array + 4
.
Looking at compiler output is often a good way to learn how to do things in asm, from syntax basics to clever optimization tricks. On the Godbolt compiler explorer:
#include <string.h>
char arr[100]; // uninitialized global array
void copy_string(){ memcpy(&arr[4], "foo\n", 4); }
// -O3 -fverbose-asm output:
movl $175075174, arr+4 #, MEM[(void *)&arr + 4B]
ret
.bss
.align 32
arr:
.zero 100
So, arr+4
is the syntax. We could write movl $const, arr+4(%eax)
to do something like the C expression array[4 + i]
. See this answer for a complete list of x86 addressing modes (mostly NASM / MASM syntax, but what really matters is what's encodable in machine code.) See also the x86 tag wiki.
Also notice how gcc puts uninitialized arrays in the .bss
(rather than .data
or .rodata
). That means there aren't a bunch of zero-bytes in your executable. Instead of switching to the section with .bss
, you could also use .comm array 100
anywhere to declare array
and reserve 100 bytes for it in the bss. It's probably less confusing to just use .bss
That immediate constant is of course 0x0a6f6f66
, our string. gcc has cleverly optimized the memcpy into a single 4-byte immediate store, since it has no use for the value(s) to still be in a register afterwards. Remember that x86 is little-endian, so the 0x66
byte goes into array+0
and 0x0a
goes into array+3
. (gcc sucks at merging separate narrow stores other than with memcpy, though; see the godbolt link. clang is better at this.)
In NASM syntax, you could even write it with a "string" as an integer constant.
mov dword [array+off], `foo\n` ;backquote for C-style \-escapes, unlike '' or ""
GNU as
doesn't allow this, except with something that's harder to read than a hex constant + comment:
movl $('f' | 'o'<<8 | 'o'<<16 | '\n'<<24), array
GNU as
syntax isn't as friendly for hand-written asm as NASM/YASM, but it's nice in some ways. (%reg
and so on makes it easy to see what's a register name and what's not.)
Speaking of immediates: Your size = 4
should be an immediate constant, not a load.
## size: .int 4 # Don't need this stored in memory anywhere
.equ size, 4
...
movl $array, %ecx # The starting address of the string to print
movl $size, %edx # The number of bytes to print
Also note that the movl $constant, (%ecx)
takes fewer bytes to encode than movl $constant, array
, so you could save code bytes by getting $array
into %ecx
sooner and then using a simple register addressing mode.