Search code examples
gccassemblygnu-assembler

junk `(0,1,1)' after expression


When I try and assemble the program, I get a bunch of the following error messages:

misha@hp-laptop:~/test$ as -gstabs test.s -o test.o && ld test.o -o a.out && rm test.o && ./a.out
test.s: Assembler messages:
test.s:19: Error: junk `(0,0,1)' after expression
test.s:20: Error: junk `(0,1,1)' after expression
test.s:21: Error: junk `(0,2,1)' after expression
test.s:22: Error: junk `(0,3,1)' after expression

Can anybody please tell me what exactly I'm doing wrong that my program won't run? Obviously, it's something that has to do with the way I'm trying to get access to the array elements each of which is one byte long. Here's the program itself:

/******************************************************************************
 *                                                                            *
 * This program prints the string "foo" on the console.                       *
 *                                                                            *
 ******************************************************************************/

.section .data
    array: .byte 0x00, 0x00, 0x00, 0x00  # An array of four bytes
    size:  .int  4                       # The size of the array


.section .text
.globl _start
_start:
    movb   $0x66, %ah   # 66 is the hexadecimal value for 'f'
    movb   $0x6F, %al   # 6F is the hexadecimal value for 'o'
    movb   $0x6F, %bh   # 6F is the hexadecimal value for 'o'
    movb   $0x0A, %bl   # A is the hexadecimal value for '\n'
    movb   %ah, array(0, 0, 1)
    movb   %al, array(0, 1, 1)
    movb   %bh, array(0, 2, 1)
    movb   %bl, array(0, 3, 1)

    # print
    movl   $4, %eax       # 4 is the number for the write system call
    movl   $1, %ebx       # The file descriptor to write to (1 - STDOUT)
    movl   $array, %ecx   # The starting address of the string to print
    movl   size, %edx     # The number of bytes to print
    int    $0x80          # Wake up the kernel to run the write system call

    # exit
    movl   $1, %eax       # 1 is the number for the exit system call
    movl   $0, %ebx       # Exit status code (echo $?)
    int    $0x80          # Wake up the kernel to run the exit system call

/*

Compile and run:

as -gstabs test.s -o test.o && \
ld test.o -o a.out && \
rm test.o && \
./a.out

*/

Solution

  • There's no asm syntax for multi-dimensional arrays, unless you build it yourself with macros. Or maybe you come up with that by replacing unused registers with 0 in (base, index, scale) syntax.

    What you can do is use an expression involving a label to get offsets from it, like movb $constant, array + 4.

    Looking at compiler output is often a good way to learn how to do things in asm, from syntax basics to clever optimization tricks. On the Godbolt compiler explorer:

    #include <string.h>
    char arr[100];   // uninitialized global array
    void copy_string(){ memcpy(&arr[4], "foo\n", 4); }
    
        // -O3 -fverbose-asm output:
        movl    $175075174, arr+4       #, MEM[(void *)&arr + 4B]
        ret
    
        .bss
        .align 32
    arr:
        .zero   100
    

    So, arr+4 is the syntax. We could write movl $const, arr+4(%eax) to do something like the C expression array[4 + i]. See this answer for a complete list of x86 addressing modes (mostly NASM / MASM syntax, but what really matters is what's encodable in machine code.) See also the tag wiki.


    Also notice how gcc puts uninitialized arrays in the .bss (rather than .data or .rodata). That means there aren't a bunch of zero-bytes in your executable. Instead of switching to the section with .bss, you could also use .comm array 100 anywhere to declare array and reserve 100 bytes for it in the bss. It's probably less confusing to just use .bss


    That immediate constant is of course 0x0a6f6f66, our string. gcc has cleverly optimized the memcpy into a single 4-byte immediate store, since it has no use for the value(s) to still be in a register afterwards. Remember that x86 is little-endian, so the 0x66 byte goes into array+0 and 0x0a goes into array+3. (gcc sucks at merging separate narrow stores other than with memcpy, though; see the godbolt link. clang is better at this.)


    In NASM syntax, you could even write it with a "string" as an integer constant.

    mov    dword [array+off], `foo\n`    ;backquote for C-style \-escapes, unlike '' or ""
    

    GNU as doesn't allow this, except with something that's harder to read than a hex constant + comment:

    movl $('f' | 'o'<<8 | 'o'<<16 | '\n'<<24), array
    

    GNU as syntax isn't as friendly for hand-written asm as NASM/YASM, but it's nice in some ways. (%reg and so on makes it easy to see what's a register name and what's not.)


    Speaking of immediates: Your size = 4 should be an immediate constant, not a load.

    ##  size:  .int  4   # Don't need this stored in memory anywhere
    .equ size, 4
    ...
    
    movl   $array, %ecx   # The starting address of the string to print
    movl   $size, %edx    # The number of bytes to print
    

    Also note that the movl $constant, (%ecx) takes fewer bytes to encode than movl $constant, array, so you could save code bytes by getting $array into %ecx sooner and then using a simple register addressing mode.