It surely got a bit tricky when I tried to write some code that would print five lines of the asterisk symbol times 4 in each one.
****
****
****
****
So I thought a nested loop could save the day. Boy I was wrong.
So I made an inner loop for the asterisks and an outer loop for the spaces as below:
.text
.globl main
main:
add $t0, $zero, $zero #i counter for the inner loop
add $t2, $zero, $zero #j counter for the outer loop
outerloop:
innerloop:
slti $t1, $t0, 4 #while (i<4)
beq $t1, $zero, innerexit
li $v0, 11 #printf("*");
la $a0, '*'
syscall
addiu $t0, $t0, 1 #i++
j innerloop
innerexit:
slti $t3, $t2, 5 #while (j<5)
beq $t3, $zero, outerexit
li $v0, 11 #printf("\n");
la $a0, '\n'
syscall
addiu $t2, $t2, 1 #j++
j outerloop
outerexit:
li $v0, 10
syscall
But the output gives me just one line:
****
What's the matter with the outer loop?
The simplest way would be to use the write-string system call N times, with a non-nested loop. (Well arguably making one long string containing all the lines would be even "simpler", but less maintainable, and bad for the size of your program with large N).
Note the use of down-counters, counting down towards zero so we can bne
against $zero
. This is idiomatic for asm, and so is putting the conditional branch at the bottom of the loop. Especially for any loop where you know the trip-count is guaranteed to be at least 1. (When that's not the case, you'd normally use a branch outside the loop to skip it if needed.)
## Tested, works in MARS 4.5
.data
line: .asciiz "****\n"
.text
.globl main
main:
li $t0, 4 # row counter
li $v0, 4 # print string call number
la $a0, line
.printloop:
syscall # print_string has no return value, doesn't modify v0
addiu $t0, $t0, -1
bnez $t0, .printloop # shorthand for BNE $t0, $zero, .printloop
li $v0, 10 # exit
syscall
You could generate the string in a temporary buffer, with a count from a register in a separate loop before the print loop. So you can still support runtime-variable row and column counts, with two sequential loops instead of nested.
With a buffer aligned by 4, we can store a whole word of 4 characters at once so that loop doesn't have to run as many iterations. (li reg, 0x2a2a2a2a
takes 2 instructions but li reg, 0x2a2a
only takes one, so going by 2 with sh
would make the code smaller).
.text
.globl main
main:
.eqv WIDTH, 5
.eqv ROWS, 4
addiu $sp, $sp, -32 # reserve some stack space. (WIDTH&-4) + 8 would be plenty, but MARS doesn't do constant expressions.
move $t0, $sp
addiu $t1, $sp, WIDTH # pointer to end of buf = buf + line length., could be a register
li $t2, 0x2a2a2a2a # MARS doesn't allow '****' or even '*' << 8 | '*'
.makerow: # do{
sw $t2, ($t0) # store 4 characters
addiu $t0, $t0, 4 # p+=4
sltu $t7, $t0, $t1
bnez $t7, .makerow # }while(p < endp);
# overshoot is fine; we reserved enough space to do whole word stores
li $t2, '\n'
sb $t2, ($t1)
sb $zero, 1($t1) # terminating 0 after newline. Unfortunately an sh halfword store to do both at once might be unaligned
move $a0, $sp
li $t0, ROWS
li $v0, 4 # print string call number
.printloop:
syscall # print_string has no return value, doesn't modify v0
addiu $t0, $t0, -1
bnez $t0, .printloop # shorthand for BNE $t0, $zero, .printloop. # }while(--t != 0)
## If you were going to return instead of exit, you'd restore SP:
# addiu $sp, $sp, 32
li $v0, 10 # exit
syscall
As expected, this prints 5 asterisks on every row.
*****
*****
*****
*****
Generally (in real systems) a system-call is much more expensive than normal instructions, so preparing a single large buffer with multiple newlines would actually make sense. (The overhead of a system call dwarfs the difference between writing 1 vs. 5 or even 20 bytes, so even though calling print_string instead of print_char is kind of hiding work inside the system call, it's justified.)
In that case you probably would want nested loops, but with sb
/ addiu $reg, $reg, 1
pointer-increment instead of syscall
. Only make one system call at the very end.
Or a loop to store all the *
characters 4 at a time (for ROWS * COLS / 4
rounded up iterations), then another loop that inserts the \n'
newlines where they belong. This lets you get all the data into memory with fewer instructions than doing everything in order 1 byte at a time. (For very large row*col counts, you would probably limit your buffer size to 4 or 8 kiB or something, so your data is still in cache when the kernel's system call handler reads it to copy it to wherever it needs to be.)
BTW, in C terms, the print char system call is more like putchar('*')
, not printf("*")
. Note that you're passing it a character by value, not a pointer to a 0-terminated string.