Little Man Computer Program to output 1 with delay of 5 seconds

Suppose a Little Man Computer program requires 1 microsecond to execute one instruction. I need to write a LMC program that takes an input between 0 and 10 (inclusive), and produces an output of one 1 but after a delay of that many seconds.

For example, an input of 5 will produce an output of 1, five seconds later. The delay need not be perfect but must be accurate within 0.01% or 100 µsec. How can I achieve that?

Solution

You'll need to create loops to spend time. You'll want to have a code block that will take approximately 1 second to execute, and then repeat the execution of that block corresponding to the number that was input. So the challenge is to design code that takes 1 second to execute.

Let's try that with a loop:

       LDA start
loop   STA count
       LDA count
       SUB one
       BRP loop

       ; ...other code...

one    DAT 1
start  DAT 999
count  DAT

This loop would have 1000 iterations. Each iteration has 4 instructions, so the 1000 iterations take 4000µs. We could beef up the code a bit to make the loop have more instructions, but we are nowhere near 1 second, so we better employ a second loop that repeatedly executes the above. That outer loop would need to iterate some 250 times, since 250*4000 = a million, i.e. 1 second.

So we would have:

       LDA start2
loop2  STA count2

; here starts the code we used earlier
       LDA start3
loop3  STA count3
       LDA count3
       SUB one
       BRP loop3
; end of the code we used earlier

       LDA count2
       SUB one
       BRP loop2
       
       ; ... other code

one    DAT 1
start2 DAT 249
start3 DAT 999
count2 DAT
count3 DAT

If we make a precise calculation of the number of instructions that are executed, then we arrive at this:

The inner loop executes 4*1000 instructions
The outer loop has 2 instructions before the inner loop, and 3 after, so in total it has an "overhead" of 5 instructions.
So that amounts to 250*(5+4000) instructions, i.e. 1001250.

This is a bit too much (because of the overhead): the deviation is about 0.1%, so we need to tune a bit the numbers.

But let's first finish the program by implementing the outer loop that executes the above code as many times as is specified in the input:

       INP
       BRZ output ; output immediately!
       SUB one

; a new outer loop: 
loop1  STA count1

; start of what we already had:
       LDA start2
       
loop2  STA count2
       LDA start3

loop3  STA count3
       LDA count3
       SUB one
       BRP loop3

       LDA count2
       SUB one
       BRP loop2
; end of what we already had

       LDA count1
       SUB one
       BRP loop1

output LDA one
       OUT
       HLT

one    DAT 1
start2 DAT 249
start3 DAT 999
count1 DAT
count2 DAT
count3 DAT

Note that the outer loop that we added also has its overhead... again 5 instructions overhead if we include LDA start2 in that count.

One iteration of loop1 thus takes 5 + 1001250 = 1001255.

Finally, let's now tune the start2 and start3 values so that we get closer to 1000000. Playing with pairs of numbers in a spreadsheet, you can find that you get close with start2 = 541 for 542 iterations, and start3 = 459 for 460 iterations (There are other interesting pairs you could use, with similar results).

Instructions executed in one iteration of loop3: 4
Instructions executed in one iteration of loop2: 5 + (460*4) = 1845
Instructions executed in one iteration of loop1: 5 + (542*1845) = 999995

So we need 5 more instructions to get to a perfect second. That's easy... just repeat an instruction a few times in the overhead of loop1. And so we end up with this:

       INP
       BRZ output ; output immediately!
       SUB one

; a new outer loop: 
loop1  STA count1

       LDA start2
       LDA start2 ; dummy instructions to spend 5µs more...
       LDA start2 ; ...
       LDA start2 ; ...
       LDA start2 ; ...
       LDA start2 ; ...
       
loop2  STA count2
       LDA start3

loop3  STA count3
       LDA count3
       SUB one
       BRP loop3

       LDA count2
       SUB one
       BRP loop2

       LDA count1
       SUB one
       BRP loop1

output LDA one
       OUT
       HLT

one    DAT 1
start2 DAT 541 ; the two magic numbers
start3 DAT 459 ; ...
count1 DAT
count2 DAT
count3 DAT

We didn't speak yet of the overhead that is executed outside of loop1. This includes the BRZ, SUB (in case non-zero input) and two instructions in the output section (LDA and OUT), so 3-4µs.

This means that we get the following delays for each of the following inputs:

input | µs from INP to OUT
------+-------------------
    0 |        3
    1 |  1000004
    2 |  2000004
  ... | ...
    9 |  9000004
   10 | 10000004

You could even get rid of those extra 4 milliseconds by skipping 4 dummy instructions in the last iteration of the outer loop. So change this:

loop1  STA count1

       LDA start2
       LDA start2 ; dummy instructions to spend 5µs more...
       LDA start2 ; ...
       LDA start2 ; ...
       LDA start2 ; ...
       LDA start2 ; ...

to this:

loop1  STA count1

       BRZ skip   ; this branches only in the last iteration
       LDA start2 ; dummy instructions to spend 4µs more...
       LDA start2 ; ...
       LDA start2 ; ...
       LDA start2 ; ...
skip   LDA start2

So now the timing is exact, except for the case where the input is 0. There is no way to spend less than 3 instructions to deal with that case (zero detection, load 1, and output it).