Search code examples
performanceoperators

How do basic operations (arithmetic, element-selection, max & comparison) compare speedwise, in programming languages?


[I'm not sure whether this is the right SE community for this question. Apologies if not, please redirect me to the right one]

If we consider some basic operations/operators common to most programming languages:

  • arithmetic ones like +, -, *, /, ^

  • comparison like ==, <, and also max, min

  • vector or matrix element selection ...[...], ...(...)

how do they compare in terms of speed, in general?

I understand that a precise answer may depend on the specific programming language, but I'd be happy to hear about general considerations and examples for such a comparison. Cheers!


Solution

  • As Eugene Sh. pointed out, this question is extremely broad, so I can't provide a complete answer here, but I'll try to give you enough context to orient you in the right direction for further exploration.

    Low-level engineers measure the time of an operation using CPU clock cycles. Depending on the processor that you're using you can expect anywhere between one million and five billion clock cycles per second.

    In most compiled languages, such as C, +, -, *, /, ==, <, min, and max are all generally going to take one CPU clock cycle when applied to numeric arguments. Exponentiation is generally going to take a dozen clock cycles or so.

    Reading values from a complex data structure such as a vector or matrix will usually require fetching data from RAM which typically takes around the order of 200 clock cycles on a modern x86-64 processor.

    Notes about the above

    • Some processors, such as ARM, have reduced instruction sets, so it takes more than one clock cycle to carry out a basic arithmetic operation. For example, ARM requires multiple clock cycles to do division.
    • Some processors, have expanded instruction sets, allowing them to do more complex arithmetic in a single clock cycle. I know that there's some processors out there that can do exponentiation in a single clock cycle.
    • While many arithmetic operations in a CPU typically only require a few clock cycles, a computer might need to move data into or out of the CPU in order to carry out the arithmetic operation. This data moving can add dozens or hundreds of clock cycles to the execution time of the arithmetic operation.

    In most interpreted languages, such as Python, each arithmetic operation or data fetch can take hundreds or thousands of clock cycles. The exception to this are interpreted languages that compile parts of the interpreted code. For example, modern JavaScript interpreters try to compile interpreted JavaScript into machine binary, allowing the JavaScript code do most arithmetic in a single clock cycle.

    Edit: This post glosses over an extraordinary amount of nuance. This post is limited in scope to SISD integer arithmetic, there are other kinds of arithmetic that CPUs can do that I do not discuss at all in this post. Even within SISD integer arithmetic, I gloss over an enormous amount of detail. Commenters specifically point out the following:

    • modern CPUs don't necessarily run code in the order that it is written. It's very normal for a CPU to start a computation and fetch data preemptively.
    • Because of the complexity of modern JavaScript implementations, JavaScript is not a strong example in this post. Implementations can compile JavaScript-level arithmetic into CPU-level integer arithmetic, which generally runs in one clock cycle. However, many things have to go right for an implementation to make that optimization, first of which is that the JavaScript-level arithmetic operates on integers.