Search code examples
linuxintelcpu-architecturetemperaturelm-sensors

Ubuntu lm-sensors: large instantaneous temperature jumps on Intel core i7


I am attempting to do some data science with CPU core temperatures. I need to monitor how CPU core temperature changes over time. I am attempting to use two tools to do this:

  1. lm-sensors for measuring core and package temperature
  2. stress for generating a load

The problem I am seeing is that as soon as stress starts the temperature skyrockets, and as soon as it stops it plummets. This can't be right!

Here is a little shell script and output to demonstrate the problem:

Script:

sensors | grep Core
stress -c 8 -t 1
sensors | grep Core
str=$'Sleeping for 1s \n' 
read -t 1 -p "$str"
sensors | grep Core

Output:

Core 0:        +49.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +51.0°C  (high = +100.0°C, crit = +100.0°C)
Core 2:        +49.0°C  (high = +100.0°C, crit = +100.0°C)
Core 3:        +47.0°C  (high = +100.0°C, crit = +100.0°C)
stress: info: [6956] dispatching hogs: 8 cpu, 0 io, 0 vm, 0 hdd
stress: info: [6956] successful run completed in 1s
Core 0:        +81.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +73.0°C  (high = +100.0°C, crit = +100.0°C)
Core 2:        +73.0°C  (high = +100.0°C, crit = +100.0°C)
Core 3:        +68.0°C  (high = +100.0°C, crit = +100.0°C)
Sleeping for 1s 
Core 0:        +51.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +53.0°C  (high = +100.0°C, crit = +100.0°C)
Core 2:        +51.0°C  (high = +100.0°C, crit = +100.0°C)
Core 3:        +48.0°C  (high = +100.0°C, crit = +100.0°C)
       +51.0°C  (high = +100.0°C, crit = +100.0°C)

Is this expected behavior? Is it physically possible for the temperatures sensors to see that much change this quickly? If so, I'm in trouble in terms of characterizing temperature changes. There is no time for me to gather data. The temperature basically spikes instantaneously, doesn't change while the jobs are running, and the vanishes as soon as the job finishes.

I ran the same experiment on an RPi and it took the fully loaded quad core about 60 seconds before frequency scaling set in, so I have no idea whats happening now that I am trying to bring the project to a more complex architecture.

This is on an Intel Core i7 Skylake architecture. Any help understanding this would be greatly appreciated.


Solution

  • This is pretty normal. There isn't much thermal mass in the chip + heat sink, compared to the power that flows through it when it's > 50C above ambient, so it quickly reaches equilibrium.

    On my i7-6700k (Skylake quad-core desktop), starting a high-power process like video encoding (x264 or x265) will ramp the cores up from ~25C idle (room temp) to ~50 or 60C within a second, then they quickly settles near 70C or so, depending on max all-core turbo of 3.9 or 4.0GHz via energy_performance_preference. (Intel since Skylake has hardware power management so it can ramp up from idle clocks in micro-seconds, not milliseconds. Clock speed decisions are made in hardware)

    I mean that once it reaches its high performance speed it stays there (3.1GHz). There is no frequency scaling to drop the frequency (like DVFS)

    If you mean throttling (down from max all-core turbo if that's higher than the rated / "guaranteed" sustained frequency), that depends on workload. To make enough heat to make turbo not sustainable, you need to run SIMD FMAs or something similarly high-power, not just a dummy loop. (e.g. Prime95 or video encoding.)

    Even Intel's stock cooler typically has enough cooling capacity to sustain some turbo with all cores busy on a lot of workloads, staying below sustained TDP. Or maybe your CPU's max all-core turbo isn't any higher than its rated speed. i7-6700k isn't: 4.0GHz for both. Only 1 or 2 core turbo is 4.2GHz. (And that's not really limited by overal thermals, more just how fast the transistors are and / or not creating a hot-spot on the one core that's active.)

    Of course the "k" models are overclockable so the stock turbo settings are conservative, but I like to keep my fans quiet, not have a burst of fan spin-up sound when a clunky web-page loads.

    My cooler is a CoolerMaster Gemini II, big clunky thing with heat pipes and a big fan that (at room temp) barely turns, so mine has more thermal mass than a stock cooler. And the rear case fan literally stops when CPU / mobo temps are below ~40C, as I configured it in the BIOS.

    I don't see what prevents the temperature from continuing to rise.

    Physics. A higher temperature difference (between chip and heat sink, and between heat sink and air) means more heat transfer per time (aka power). The thermal mass of the chip + heat sink is like a capacitor, the thermal connection from chip to air is like a resistor, and the constant heat power input is like current.

    So the temperature asymptotically approaches equilibrium, just like in an RC circuit. The equilibrium point (above ambient) depends linearly on total power.

    (Heat conduction (and fan-forced convection) scales linearly with temperature difference, just like electrical conductance / resistance. It's the dominant factor here, not radiative transfer that scales with absolute T^4)

    Also, dynamic fan speed that ramps up based on CPU temperature.

    BTW, I think the heatpipes on my cooler explain the very quick ramp-up to ~60C, and then gradual ramp-up the rest of the way: the CPU itself can get hot very fast, and starts transferring heat into the heatpipes (which go into the base of the cooler, so there's just some thermal paste and copper). It can absorb heat directly by vaporizing its working fluid. But with sustained heat input, the heat has to go somewhere: into the mass of fins, and from there to the air. So the gradual asymptotic increase may be as the fins themselves heat up, having to dissipate heat into the air, not just conduct it out of the heat-pipe.


    There are systems built without enough sustained cooling to handle sustained max-turbo. For x86 systems, you'll find those in laptops, especially light-weight and especially ultra-portable laptops with Core-Y CPUs (TDP of like 7.5W, but still full Skylake cores with AVX2 that can turbo pretty high).

    Why can't my ultraportable laptop CPU maintain peak performance in HPC has some data showing clock speed falling off, and my answer there explains why they build systems this way: burst performance is what you want for interactive use, and the combo of light weight (fans / heat sinks) + high burst inevitably means they can't sustain their max turbo.

    But desktops can be heavy, and people do want machines that can crunch numbers for a long time at clock speeds as high as possible.