Search code examples
gem5

How to Increase the simulation speed of a gem5 run


I wish to simulate a quite non-trivial program in the gem5 environmnet.

I have three files that I cross-compiled accordingly for the designated ISA:

  • main.c
  • my_library.c
  • my_library.h

I use the command

build/ARM/gem5.opt configs/example/se.py --cpu-type=TimingSimpleCPU -c test/test-progs/hello/src/my_binary

But is there a way, maybe an argument of the se.py script that can make my simulation proceed faster ?


Solution

  • The default commands are normally the fastest available (and therefore lowest simulation accuracy).

    gem5.fast build

    A .fast build can run about 20% faster without losing simulation accuracy by disabling some debug related macros:

    scons -j `nproc` build/ARM/gem5.fast
    build/ARM/gem5.fast configs/example/se.py --cpu-type=TimingSimpleCPU \
      -c test/test-progs/hello/src/my_binary
    

    The speedup is achieved by:

    so in general .fast is not worth it if you are developing the simulator, but only when you have done any patches you may have, and just need to run hundreds of simulations as fast as possible with different parameters.

    TODO it would be good to benchmark which of the above changes matters the most for runtime, and if the link time is actually significantly slowed down by LTO.

    gem5 performance profiling analysis

    I'm not aware if a proper performance profiling of gem5 has ever been done to access which parts of the simulation are slow and if there is any way to improve it easily. Someone has to do that at some point and post it at: https://gem5.atlassian.net/browse/GEM5

    Options that reduce simulation accuracy

    Simulation would also be faster and with lower accuracy without --cpu-type=TimingSimpleCPU :

    build/ARM/gem5.opt configs/example/se.py -c test/test-progs/hello/src/my_binary
    

    which uses an even simpler memory model AtomicSimpleCPU.

    Other lower accuracy but faster options include:

    • KVM, but support is not perfect as of 2020, and you need an ARM host to run the simulation on
    • Gabe's FastModel integration that is getting merged as of 2020, but it requires a FastModel license from ARM, which I think is too expensive for individuals

    Also if someone were to implement binary translation in gem5, which is how QEMU goes fast, then that would be an amazing option.

    Related

    Gem5 system requirements for decent performance