Search code examples
assemblybufferx86-64gnu-assembleratt

Assembly and Buffers How to


I'm quite confused and have hit a block. An the assignment for my class has me doing the following.

  1. Greet user
  2. Prompt for input
  3. Convert string into all caps
  4. Display message to user with converted string

I have no issues with 1 and 2, and when needed I can figure out the loop to convert lower case into upper case, something like;

cmp $96, %ah
jg Subtract
call Loop

Subtract:
    sub $32, %ah
    mov %ah, back to the array
    ret

That may not be the best way, but I can figure that out once I figure out this array and buffer. So the way the prof has us do things, involves using his library. In order to get user input the code looks like this:

.data
Intro:
     .ascii "Hey enter in your what you want converted.\n\0"
Task:
     .space 5  #This is the buffer that is supposed to limit what the user can enter.... I'm very confused about how to make this work

.text
.global _start

_start:

    mov $Intro, %rax
    call PrintCString #from the Prof's library
    mov $Task, %rbx
    call ScanCString

Here's what he says about using ScanCString Input = rax, rbx Notes = Scans a null-terminated string and stores it into the address %rax. The register %rbx must contain the maximum number of characters that can be read (the size of the buffer).

My idea from here as I'm sure you could gather from above is move each character, determine if it's upper or lower case and adjust accordingly. I'll run this through a loop and then spit it back out to the user.

Here's everything I have so far, please don't mind the insanity of my testing stuffs as I go. Any help will be greatly appreciated.

.data
    Intro:
         .ascii "\nYup... Mr. Meekseeks here to help.  What ya want me to do?\n\n\0"
    Task:
         .space 5
    NewLine:
         .ascii "\n\n\0"
    Goodbye:
         .ascii "\nYou got it buddy I'll get right on doing \0"
    Test:
         .ascii "\nYou made it to the first loop\n\0"
    Test2:
         .ascii "\nYou made it past the first compare\n\0"
    Test3:
         .ascii "\nYou added 1 to the pointer\n\0"
    Test4:
         .ascii "\nHere's the string length \0"

.text

.global _start

_start:

    mov $Intro, %rax                #start with greeting the user
    call PrintCString               #Print the greeting to the user
    mov $0, %rax
    mov Task(,8), %rbx              #move the buffer into RBX, prep for input
    call ScanCString                #User Input
#   mov %rax, Task
#   mov %rax, %rdx                  #Move the message so it's not destroyed
    call LengthCString              #Determine loop limit
    mov $0, %rdi                    #set pointer
    mov %rax, %rcx                  #set counter to zero
    mov Task(%rdi), %rax

    call PrintCString

    mov $Test4, %rax
    call PrintCString
    mov %rcx, %rax
    call PrintInt
    mov %rax, %rcx
    mov $Test, %rax
    call Loop

Loop:
    mov $Test2, %rax
    call PrintCString
    add $1, %rdi
    mov $Test3, %rax
    call PrintCString
    cmp %rdi, %rcx
    je Closing
    call Loop

#   movb %rax, %rdx
#   mov %rcx, %rax
#   call PrintInt
#   call PrintCString

#   call Ending

#Adding:

# Greeting:

#   mov $Intro, %rax
#   call PrintCString
#   ret

Closing:
    mov $Goodbye, %rax
    call PrintCString

    call Ending

Ending:

    mov $NewLine, %rax
    call PrintCString
    call EndProgram

Solution

  • ScanCString Input = rax, rbx Notes = Scans a null-terminated string and stores it into the address %rax. The register %rbx must contain the maximum number of characters that can be read (the size of the buffer).

    looks like you should call it like this:

        mov  $Task, %rax   # set rax to point to the buffer in memory
        mov  $5, %rbx      # size of buffer (5 bytes)
        call ScanCString
    

    Yours:

        mov $0, %rax
    

    Sets rax to zero (the ScanCString will use it as memory pointer, so it will reference "null" pointer and probably cause crash, or exit early if it has safety test for null pointer).

        mov Task(,8), %rbx              #move the buffer into RBX, prep for input
    

    Loads eight bytes from memory address Task (actually more likely like syntax error, I think (,8) will not parse, but I'm not AT&T syntax expert... still if I will guess it's like Task(<noreg>,<noreg>,8), it makes no sense to provide scale factor 8, as you have no index register there, so logically I would evaluate that as zero offset, to add to the Task address, and using that to load 8 bytes.

    As the Task: is followed by .space 5, it means the 5 undefined bytes will be read from that space reservation, and next 3 bytes after those will be read to hit full 8 bytes (rbx size).

        call ScanCString                #User Input
    #   mov %rax, Task
    

    This will store 8 bytes of rax value into the memory pointed to by Task: (where only 5 bytes are reserved, so it will overwrite 3 bytes beyond).

    #   mov %rax, %rdx                  #Move the message so it's not destroyed
    

    rdx = rax ... it's not clear, what does ScanCString return in rax, so who knows what gets stored into memory/rdx by the above two instructions.

        call LengthCString              #Determine loop limit
    

    This one probably expects pointer to memory in rax, so doing ahead mov $Task, %rax would seem logical to me, but you didn't post documentation of LengthCString.

    ...
    jg Subtract
    ...
    
    Subtract:
        ...
        ret
    

    jg is jump, it does not store return address to the stack, so the ret at end feels wrong, ret needs corresponding call to be paired with (unless you know what you are doing, and you prepare returning address in stack by other means than call, but that's not a topic for beginner).

    Task:
         .space 5  #This is the buffer that is supposed to limit
         # what the user can enter....
         # I'm very confused about how to make this work
    

    Nope, this just reserves 5 bytes in the current section of code (.data). It doesn't limit code in any way, it doesn't even define any content of that memory, so upon execution it may contain anything (I think the .data are usually zeroed on most of the platforms, so it will contain five zero bytes on most of the target platforms (next time rather specify one)).

    That 5 is just in the source code, it's not part of resulting machine code, so it's know to the assembler, but not to the code itself. If you want to DRY, you should define some constant like buffer_length = 5 (not sure about AT&T syntax), and then do in data section Task: .space buffer_length and ahead of scan call mov $buffer_length, %ebx (ebx is enough, as you will probably not use 4+GB length, and setting ebx part only will clear remaining upper 32 bits of rbx), that way the magic constant 5 will be kept only at single place in source code.

    Overall it looks like you missed so far the whole concept of what is register, what is computer memory, how CPU interacts with it, etc.. try to read some book or tutorial with explanations, and your course notes. Copying already existing code and adjusting it, until it produces output you expect, works much better for high level languages than for assembly, in assembly better try to fully understand how the things work. The good part is, that the computer is actually very simple machinery, an state-automata behaving in deterministic manner, with only few possible instructions to be executed at each cycle, so understanding it isn't that hard, just don't use human expectations/logic on it, it's calculation machine, not high level language supposed to be written/read by human and for human.

    Also when writing assembly code, it's essential to always verify each instruction works as expected in debugger, by single-stepping over them, and checking all state changes produced by instruction. And in case of some discrepancy, consult instruction reference guide, to make sure you fully understand what particular instruction does. There are some short versions on web like http://www.felixcloutier.com/x86/ if you don't want to dig into Intel's original pdf (freely available from Intel web). Don't judge what instruction does by it's name, notorious examples (and questions on SO) are for example mul and div, which don't follow common expectations at all.

    Also the AT&T syntax is IMO more "machine-like", great for parsing by machine, and very precise/exact, but IMO harder to write/read casually, when compared to the relaxed Intel syntax.

    Especially things like:

    mov  $Task, %rax   # set rax to point to the buffer in memory
    mov  Task, %rax    # set rax to 64b value from memory at Task addres
    

    Makes perfect sense for machine parsing, the $ distinguishing between immediate value and memory reference, but try to never forget to write that $ by hand, when you want address of memory, and not content.