Search code examples
cpu-architectureenergy

For a Single Cycle CPU How Much Energy Required For Execution Of ADD Command


The question is obvious like specified in the title. I wonder this. Any expert can help?


Solution

  • OK, this is was going to be a long answer, so long that I may write an article about it instead. Strangely enough, I've been working on experiments that are closely related to your question -- determining performance per watt for a modern processor. As Paul and Sneftel indicated, it's not really possible with any real architecture today. You can probably compute this if you are looking at only the execution of that instruction given a certain silicon technology and a certain ALU design through calculating gate leakage and switching currents, voltages, etc. But that isn't a useful value because there is something always going on (from a HW perspective) in any processor newer than an 8086, and instructions haven't been executed in isolation since a pipeline first came into being.

    Today, we have multi-function ALUs, out-of-order execution, multiple pipelines, hyperthreading, branch prediction, memory hierarchies, etc. What does this have to do with the execution of one ADD command? The energy used to execute one ADD command is different from the execution of multiple ADD commands. And if you wrap a program around it, then it gets really complicated.

    SORT-OF-AN-ANSWER:

    So let's look at what you can do.

    • Statistically measure running a given add over and over again. Remember that there are many different types of adds such as integer adds, floating-point, double precision, adds with carries, and even simultaneous adds (SIMD) to name a few. Limits: OSs and other apps are always there, though you may be able to run on bare metal if you know how; varies with different hardware, silicon technologies, architecture, etc; probably not useful because it is so far from reality that it means little; limits of measurement equipment (using interprocessor PMUs, from the wall meters, interposer socket, etc); memory hierarchy; and more
    • Statistically measuring an integer/floating-point/double -based workload kernel. This is beginning to have some meaning because it means something to the community. Limits: Still not real; still varies with architecture, silicon technology, hardware, etc; measuring equipment limits; etc
    • Statistically measuring a real application. Limits: same as above but it at least means something to the community; power states come into play during periods of idle; potentially cluster issues come into play.

    When I say "Limits", that just means you need to well define the constraints of your answer / experiment, not that it isn't useful.

    SUMMARY: it is possible to come up with a value for one add but it doesn't really mean anything anymore. A value that means anything is way more complicated but is useful and requires a lot of work to find.

    By the way, I do think it is a good and important question -- in part because it is so deceptively simple.