Is this MIPS program supposed to take this long? (program that approximates pi)

Part 2 of this

The objective of that was to write a MIPS program that used that algorithm. I did. It works, only it takes about 30 minutes to complete with delta = 1E-2. The C program (compiled with gcc) takes about 1 minute and a half with that delta. I've tried with delta = 1E-3 on the C program but had to abort it after over 2 hours.

I just want to know: is this supposed to happen? The result looks accurate enough to me (3.13909200 with delta = 1E-2). Am I doing something wrong?

I know this algorithm is probably not the most efficient, neither is MIPS, or MARS (which I use for MIPS).

MIPS code:

    .data

l_cubo:     .double     1.0
delta:      .double     1E-2
zero:       .double     0.0
dois:       .double     2.0
six:        .double     6.0


    .text
    .globl main

main:
    la  $a0,l_cubo
    l.d $f20,0($a0) #l_cubo
    la  $a0,dois

    l.d $f4,0($a0)
    div.d   $f22,$f20,$f4   #r_esfera
    la  $a0,delta
    l.d $f24,0($a0) #delta
    la  $a0,zero
    l.d     $f26,0($a0)     #v_cubo ou v_total
    l.d $f28,0($a0)     #v_esfera

    l.d $f4,0($a0)      #x
    l.d $f6,0($a0)      #y
    l.d $f8,0($a0)      #z

loop_x:

    c.lt.d  $f4,$f20
    bc1f    end_loop_x
    l.d $f6,0($a0)
    loop_y:

        c.lt.d  $f6,$f20
        bc1f    end_loop_y
        l.d $f8,0($a0)
        loop_z:

            c.lt.d  $f8,$f20
            bc1f    end_loop_z
            add.d   $f26,$f26,$f24
            mov.d   $f12,$f4
            mov.d   $f14,$f6
            mov.d   $f30,$f8

            jal in_esfera


            l.d $f10,0($a0)

            beqz    $v0,continue

                add.d   $f28,$f28,$f24
            continue:
                add.d   $f8,$f8,$f24
                j loop_z
            end_loop_z:
        add.d   $f6,$f6,$f24
        j loop_y
        end_loop_y:
        add.d   $f4,$f4,$f24
        j loop_x
end_loop_x:

mul.d   $f24,$f24,$f24
mul.d   $f28,$f28,$f24
mul.d   $f26,$f26,$f24

div.d   $f28,$f28,$f26

la  $a0,six
l.d $f10,0($a0)
mul.d   $f28,$f28,$f10

li  $v0,3       #
mov.d   $f12,$f28   #
syscall         # print pi

li $v0,10       #
syscall         #exit

####################################

    .text
    .globl  in_esfera

in_esfera:

    sub.d   $f12,$f12,$f22
    mul.d   $f12,$f12,$f12
    sub.d   $f14,$f14,$f22
    mul.d   $f14,$f14,$f14
    sub.d   $f30,$f30,$f22
    mul.d   $f30,$f30,$f30
    add.d   $f30,$f12,$f30
    add.d   $f30,$f14,$f30

    mul.d   $f16,$f22,$f22

    li $v0,0
    c.le.d  $f30,$f16
    bc1f    continue2
        li $v0,1
    continue2:
        jr  $ra

I'm just wondering how my professor is going to correct a program that takes 30mins to execute.

Solution

I am assuming that this using the same algorithm as the C version. That approximates a value for Pi by testing a 3D grid of points in a cube to see if they are within a sphere. That is an O(N^3) computation where N is the number of units (deltas) in each dimension of the grid.

So ... yes ... it is expected for your MIPS code to take a long time to compute an accurate approximation of Pi.

If l_cubo is 4, and delta is 1/100, then you should be performing 400 x 400 x 400 = 64,000,000 iterations. 30 minutes seems excessive for that.
If l_cubo is 4, and delta is 1/1000, then you should be performing 4000 x 4000 x 4000 = 64,000,000,000 iterations.

But if you want to sanity check it, your MIPs code should be as fast, if not faster than, the C implementation when run on the same hardware with the same parameters. (Note: if you running the MIPS code on a MIPS emulator, you won't be able to do that.)