I want to boot the Linux kernel in full system (FS) mode with a lightweight CPU to save time, make a checkpoint after boot finishes, and then restore the checkpoint with a more detailed CPU to study a benchmark, as mentioned at: http://gem5.org/Checkpoints
However, when I tried to use -r 1 --restore-with-cpu=
I cannot observe cycle differences between the new and old CPU.
The measure I'm looking at is how cache sizes affect the number of cycles that a benchmark takes to run.
The setup I'm using is described in detail at: Why doesn't the Linux kernel see the cache sizes in the gem5 emulator in full system mode? I'm looking at the cycle counts because I can't see cache sizes directly with the Linux kernel currently.
For example, if I boot the Linux kernel from scratch with the detailed and slow HPI
model with command (excerpt):
./build/ARM/gem5.opt --cpu-type=HPI --caches --l1d_size=1024 --l1i_size=1024 --l2cache --l2_size=1024 --l3_size=1024
and then change cache sizes, the benchmark does get faster as the cache sizes get better as expected.
However, if I first boot without --cpu-type=HPI
, which uses the faster AtomicSimpleCPU
model:
./build/ARM/gem5.opt --caches --l1d_size=1024 --l1i_size=1024 --l2cache --l2_size=1024 --l3_size=1024
and then I create the checkpoint with m5 checkpoint
and try to restore the faster CPU:
./build/ARM/gem5.opt --restore-with-cpu=HPI -r 1 --caches --l1d_size=1024 --l1i_size=1024 --l2cache --l2_size=1024 --l3_size=1024
then changing the cache sizes makes no difference: I always get the same cycle counts as I do for the AtomicSimpleCPU
, indicating that the modified restore was not successful.
Analogous for x86 if I try to switch from AtomicSimpleCPU
to DerivO3CPU
.
Related old thread on the mailing list: http://thread.gmane.org/gmane.comp.emulators.m5.users/14395
Tested at: fbe63074e3a8128bdbe1a5e8f6509c565a3abbd4
--cpu-type=
affected the restore, but --restore-with-cpu=
did not
I am not sure why that is, but I have empirically verified that if I do:
-r 1 --cpu-type=HPI
then as expected the cache size options start to affect cycle counts: larger caches leads to less cycles.
Also keep in mind that caches don't affect AtomicSimpleCPU
much, and there is not much point in having them.
TODO so what is the point of --restore-with-cpu=
vs --cpu-type
if it didn't seem to do anything on my tests?
Except confuse me, since if --cpu-type != --restore-with-cpu
, then the cycle count appears under system.switch_cpus.numCycles
instead of system.cpu.numCycles
.
I believe this is what is going on (yet untested):
switch_cpu
contains stats for the CPU you switched to--restore-with-cpu= != --cpu-type
, it thinks you have already
switched CPUs from the start--restore-with-cpu
has no effect on the initial CPU. It only
matters for options that switch the CPU during the run itself, e.g.
--fast-forward
and --repeat_switch
. This is where you will see both cpu and switch_cpu data get filled up.TODO: also, if I use or remove --restore-with-cpu=
, there is a small 1% cycle difference. But why is there a difference at all? AtomicSimpleCPU
cycle count is completely different, so it must not be that it is falling back to it.
--cpu-type=
vs --restore-with-cpu=
showed up in fs.py --fast-forward
: https://www.mail-archive.com/gem5-users@gem5.org/msg17418.html
Confirm what is happening with logging
One good sanity that the CPU want want is being used, is to enable some logging as shown at: https://github.com/cirosantilli/linux-kernel-module-cheat/tree/bab029f60656913b5dea629a220ae593cc16147d#gem5-restore-checkpoint-with-a-different-cpu e.g.:
--debug-flags ExecAll,FmtFlag,O3CPU,SimpleCPU
and shen see if you start to get O3
messages rather than SimpleCPU
ones.