Search code examples
assemblyx86temperature

How to write x86 assembly code to check the effect of temperature on the performance of the processor


I have to write an x86 assembly code that should run on Intel x86 processor.

Actually have to write like addition or move instructions to see the effect of these instructions of the performance of processor w.r.t temperature. That means my code should be capable of controlled heat generation from processor.

If you people have such a code or any one having experience to write such type of code please share.


Solution

  • For maximum heat, you want as many transistors as possible changing state every clock cycle. The floating point FMA units have a lot of transistors; keeping them busy makes a lot of heat, especially for 256b AVX vectors.

    e.g. see the "stress testing" section of this Skylake overclocking guide, where you can see that Prime95 version 28 and Linpack are the hottest-running workloads. There's also a table of whole-system power consumption.

    See also http://agner.org/optimize/ to learn more about CPU internals, especial Agner's microarch guide. You should be able to make less or more heat by having a loop that fits in the loopback buffer or not. The x86 decoders are much more power-intensive than reusing already-decoded uops. See this Q&A about uop throughput for various loop sizes, for the case where there aren't significant dependencies between the instructions so only the frontend limits throughput. (See also the tag wiki).


    I doubt you'll see very much different in heat from integer add reg, reg vs. mov reg, reg or something. Maybe saturating the throughput of the integer mul unit would make a measurable heat / power difference, but the different cost of an adder vs. a mov or a simple boolean op is probably dwarfed by the power cost of out-of-order execution tracking the add through the pipeline.

    Loads or stores that keep the cache and store-buffer hardware active might be a different story, but add can have a memory source or dest too. Just make sure you don't bottleneck your loop on the store-forwarding latency of a single memory-destination add.


    For minimum heat without actually sleeping, use the pause instruction in a loop. On Skylake, it sleeps much longer (~100 cycles) than on previous Intel microarchitectures (~5 cycles), IIRC.

    According to powertop on Linux, the kernel uses mwait with different hints to enter different levels of sleep on Intel CPUs (e.g. my Skylake desktop). You might be able to do this from user-space if you want, or use nanosleep to alternate sleep/wake and run a heat-producing workload with a certain duty cycle.

    Sleeping frequently may prevent the OS from ramping the CPU up to full clock speed, depending on your setup. Why does this delay-loop start to run faster after several iterations with no sleep?

    For other ideas on reducing throughput in a loop, see Deoptimizing a program for the pipeline in Intel Sandybridge-family CPUs. Stalls that are just slow without flipping a lot of transistors to recover might be a good way to make a loop that doesn't make much heat.


    Without pause, you'll see significant heating from just a simple infinite loop like .repeat: jmp .repeat, especially on a CPU that can "turbo" up to a high voltage/frequency for as long as thermal limits allow.