Search code examples
assemblymips

(MIPS) are some assembly instructions faster than others?


Are certain bare MIPS instructions faster than others? The question that sparked my interest was multiplying a register by a power of 2.

Let's assume $t0 has a number that won't overflow. If I want to multiply that register by 8, is there any quantifiable performance difference between:

a 3-bit sll:

    sll     $t0,  $t0,3

using the mul command (assume $t8's value is 8):

    mul     $t0,  $t0,$t8

or using the mult command?

    mult    $t0,  $t0,$t8

Each example consists of a single instruction, but I don't know if one's faster than the other. Intuition makes me think mul is faster than mult, since there's no storage of the extraneous bits into HI (is that correct?)

Alternatively, does anyone know of any articles/webpages on the topic of individual instruction speed in assembly (MIPS or whatever)? I would imagine that the different instructions are composed of different circuitry/hardware, and that every instruction executes in different amounts of time, but I can't seem to find any resources about this online.

I'm very new to MIPS/assembly, so please forgive me for not running a timing example (or for potentially using incorrect syntax in my examples above).


Solution

  • MIPS32TM Architecture For Programmers Volume II: The MIPS32TM Instruction Set,
    mul / mult instrutions':

    Programming Notes:
    In some processors the integer multiply operation may proceed asynchronously and allow other CPU instructions to
    execute before it is complete. An attempt to read LO or HI before the results are written interlocks until the results are
    ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance
    improvement by scheduling the multiply so that other instructions can execute in parallel.
    Programs that require overflow detection must check for it explicitly.
    Where the size of the operands are known, software should place the shorter operand in GPR rt. This may reduce the
    latency of the instruction on those processors which implement data-dependent instruction latencies.
    

    So yes, multiplication by an arbitrary number is one of the very few things in MIPS that can take more cycles than other instructions.
    The way the manual specifies mul, it is possible for it to be implemented as mult then mflo, in which case mul and mult obviously have exactly the same timing characteristics.

    It could also genuinely be a separate instruction, in which case it might be faster (perhaps avoiding calculation of the high half for at least power reasons), but I suspect few hardware implementations have done so.
    The multiply/divide unit is one of the poorer aspects of the MIPS architecture.