Search code examples
stringassemblymipsstring-concatenationlow-level

How do I reverse the line order of a multi-line string in MIPS?


I have a multi-line message of the form:

.data
    msg:
        .ascii "aYZ B"
        .byte 10
        .ascii "234"
        .byte 10
        .ascii "b cd A"
        .byte 10

And I need to reverse the order the lines are printed in so that:

aYZ B ---------------- b cd A

234 ----becomes--- 234

b cd A ---------------- aYZ B

My general idea so far is to push the address of the first char onto the stack, then iterate through the message (base address of msg + offset counter) and push the address of every char immediately after a '\n' char (.byte 10) onto the stack (index of '\n' char + 1).

Then, I will be able to pop the first letter in each line off the stack in reverse order.

What I am struggling with is how I can modify the original msg as I loop through it. Should I just build a new msg in reverse order? If so, how? I'm guessing I would use string concatenation for this?

Lastly, how do I print that message out? (I can use syscall 4 but I would need the entire message stored in one label).

EDIT:

So I managed to put a solution together and works almost correctly. It has one small bug: the very last line of the message doesn't print on its own line, it just prints immediately after the second to last line.

If anyone knows how to fix that small issue I would love to know how.

.data
    key: .byte 1
    msg:
        .ascii "ayZ B"
        .byte 10
        .ascii "234"
        .byte 10
        .ascii "b cD a"
        .byte 10 
.text
main:
    jal reverseLinesAndPrint
    # Exit Program
    li $v0, 10
    syscall

reverseLinesAndPrint:           # $s3 contains address of last char, $s0 contains offsets for both front and back, but must be reset before using for back 
    li $s0, 0                   # RESET value of msg position offset index to iterate from beginning again
    lineLoop:
    add  $s1, $t0, $s0          # Set $s1 equal to the base address of msg ($t0) + the position offset index ($s0)
    lb   $s2, ($s1)             # Deref. and move the current char into s2 for checking
    bne $s2, $zero, notLastChar # If the current char is not the last char in the msg, keep looping
        subi $s3, $s1, 1        # Subtract 1 from the ADDRESS of $s1 to get the last char ('\n') before the NULL Terminator and store it in $s3
        j lastCharIndexFound    # Exit the loop by jumping past it
    notLastChar:
    addi $s0, $s0, 1            # Increment the position offset index
    j lineLoop

    lastCharIndexFound:         # We now have the address of the last valid char in message (always '\n') stored in $s3
    li $s0, 0                   # RESET value of msg position offset index to iterate from ending this time
    reverseLineLoop:
    sub $s1, $s3, $s0           # This time, we are going to subtract from the starting address so we can iterate backwards over msg
    bne $t0, $s1, notFirstChar  # If we iterate all the way to the very first char in msg, exit the loop
        li $v0, 4               # Since the first char doesn't have a '\n' char, we have to manually print
        move $a0, $s1
        syscall
        j exit                  
    notFirstChar:
    lb $s2, ($s1)               # Deref. and move the current char into s2 for checking
    bne $s2, 10, notNLChar      # If we find a '\n' char, we need to do some logic
        li $v0, 4               # First we need to call a syscall to print string (it will stop on the previous '\n' which is now NULL)
        move $a0, $s1
        syscall
        sb $zero, ($s1)         # Second, we need to replace that current '\n' char with NULL
    notNLChar:                  # If the current char is not '\n', keep looping
    addi $s0, $s0, 1            # Increment the offset
    j reverseLineLoop           # Jump to next iteration

exit:
    jr $ra  # Jump back to main

Solution

  • I think you're using the newline before a line to separate it from the line you printed previously. This is a clever idea and more efficient than printing just the line (without the preceding newline). Otherwise you'd have to make a separate print-single-char system call, like syscall with $v0 = 11 / $a0 = '\n' (MARS system calls)

    This means your output is like "\nline3" then "\nline2", etc. leaving the cursor at the end of each line.

    But you need to special-case the very last line (first of the input string) because there's no \n before it. You already are special casing it, so simply print a \n manually before it, as a line ending for the previous line, with the print-char syscall.


    Another way to do this might be to store a 0 over the character after the newline, at 1($s1), so when you later reach the start of this line you can print it as "line2\n" including a newline at the end. (My version of this included below.)

    The special case becomes the final line of input (first line of output), but it's actually fine to store a 0 byte after its newline if you have a 0-terminated C-style implicit-length string. There will already be one there, so you could skip that when entering the outer loop, or not if it's more convenient not to.


    Without modifying the data: write(1, line, length)

    MARS has a write() system call ($v0=15) that takes a pointer + length, so you don't need strings to be 0-terminated. Exactly like POSIX write(int fd, char *buf, size_t len). file-descriptor $a0 = 1 is stdout in MARS4.3 and later.

    When you find a newline, you can record that position and keep looping. When you find another one, you can do subu $a2, $t1, $t0 ($a2 = end - start) to get a length, and set $a1 pointing at the char after the newline.

    So you can print chunks of your choice without having to mangle the input data, making it usable on read-only inputs or without making a copy to destroy for something you need later.


    Other stuff / code review:

    Your code is weird; you call reverseLinesAndPrint without putting a pointer and length or end-pointer in registers in main, so why make it a separate function at all? It's not reusable.

    The normal thing to do would be to put another label at the end of your block of ASCII data so you can get that address into a register without having to scan through the string to find the length. (Especially because you don't explicitly have a 0 byte at the end of the string to terminate it. There is one because you didn't put any other data right after, and MARS leaves a gap between data and code when you use the memory model that puts the start address of the data section at address 0.)

    And you never even use la $reg, msg. It seems you hard-code its address as 0? And you read $t0 without initializing it first. MARS starts out with all registers zeroed. (So it's possible to miss bugs like this because that's a valid address with the memory layout you chose.)

    In the normal MIPS calling convention, $s registers are call-preserved ("saved"), aka non-volatile. But your function uses them as temporaries without saving/restoring them. Using the $t registers instead (and $a0..3 and $v0..1) would be normal.

    Your loops are inefficient: you can put the conditional branch at the bottom like a do{}while(). The way you're writing loops is really clunkly and involves 2 taken branches per loop iteration (including the unconditional loop branch). Or 3 for your search loop where you need to check for \n and for p == end.

    // your loops are over-complicated like this:
    do {
     loop body;
     if (p == end) {  // conditional branch over the loop epilogue
       stuff;         // put this after the loop instead of jumping over it inside the loop
       goto out;
     }
     counter increment;
    } while(1);
    out:
    

    Also: write a block of comments somewhere that say what each register is for. For some it can be on an instruction that initializes the register.

    In general your comments are pretty good, mostly describing at a higher level what's going on, not stuff like "add 1 to $s0" that you can already see from the actual instruction.


    Here's how I did it

    I used the idea of overwriting the first character of a line after printing it. This is the byte that follows a newline. So when we print lines, they're like line2\n not \nline2.

    I also put a label at the end of msg instead of using a strlen loop. If you are going to iterate forward over the string, yes you should save pointers somewhere (e.g. on the stack) as you go to save work later, like you initially were thinking. But for assemble-time-constant strings, we can just have the assembler tell us where the end it. I also packed the lines into one .ascii string to keep the source more compact. I added an explicit .byte 0 terminator (instead of .asciiz) so I could have the label on the terminator instead of after.

    I of course use pointers, not indices, so I didn't need an add for indexing inside the loop. I used lbu in case zero-extension is more efficient than sign-extension to 32-bit. I'd rather think of char values as small integers from 0..255 than -128..127. Not that I do any signed comparisons on them, only for equality.

    I used addiu because I don't want to trap on signed overflow on pointer math. The only reason to ever use add instead of addu is to trap on signed overflow.

    The inner loop still needs 2 conditional branches to check for both termination conditions, but this is an example of how compact and efficient you can make a loop like this with careful planning.

    .data
     msg:
        .ascii "ayZ B\n234\nb cD a\n"
     endmsg:         # 0 terminated *and* we have a label at the end for convenience.
        .byte 0
    
    .text
    main:
        la   $a0, endmsg
        la   $a1, msg
        jal reverseLinesAndPrint   # revprint(end, start)
    
        li $v0, 10
        syscall           # exit()
    
    reverseLinesAndPrint:
    # $a0 = end of string.  We assume it's pointing at a '\0' that follows a newline
    # $a1 = start of string
    # $t2 = tmp char
    # we also assume the string isn't empty, i.e. that start - end >= 2 on function entry.
    
    # the first char the inner loop looks at is -2(end)
    
     #move  $t0, $a0          # instead we can leave our args in a0, a1 because syscall/v0=4 doesn't modify them
    
     lines:                       
       findNL_loop:                    # do {  // inner loop
         addiu  $a0, $a0, -1             # --p
         beq    $a1, $a0, foundStart     # if(p==start) break
         lbu    $t2, -1($a0)                # find a char that follows a newline
         bne    $t2, '\n', findNL_loop   # }while(p[-1] != '\n');
       # $a0 points to the start of a 0-terminated string that ends with a newline.
       foundStart:
        li     $v0, 4
        syscall                        # fputs(p /*$a0*/, stdout)
    
        sb     $zero, ($a0)            # 0-terminate the previous line, after printing
        bne    $a0, $a1, lines         # } while(!very start of the whole string)
    
        jr $ra
    

    Tested and works with your data. Not tested for corner cases like an empty first line, although it does work for an empty last line. I think I avoid reading before the first character in all cases (except for too-short inputs that violate the pre-conditions in comments. You could check for those before entering the loop if you wanted to handle them.)

    Note that bne $t2, 10, target is a pseudo-instruction. If I was optimizing more, I'd hoist that 10 out of the loop with an li into $t3 or something, instead of letting the assembler set up that constant in a register every iteration. Same for the li $v0, 4 - that syscall has no return value so it doesn't even destroy $v0.

    Using offsets like -1($a0) in addressing modes is "free" - the instruction has 16 bits of immediate displacement so we might as well use it instead of separate pointer math.

    I used $t2 instead of $t0 for no real reason, just to have a unique number for every reg I was using for human readability with MARS smallish font.