Search code examples
gem5

Cause of IOBus Errors in gem5


I have had this error a few times over the course of using gem5. The error that gets thrown is typically something along the lines of:

build/<ISA>/mem/xbar.cc:360: fatal: Unable to find destination for [addr:addr+size] on system.iobus

And upon inspection, the packet that causes this issue typically has a few typical characteristics.

  1. The error on occurs after booting from a checkpoint.
  2. It comes from the cache hierarchy, gets fed through the memory bus, and then goes to the IOBus (the source, though, is originally from the processor). This isn't immediately obvious because it only reaches the IOBus after a few "events" prior (i.e., it isn't in the stack frame in a debugging session).
  3. There is no good way to determine the intended device from the requesting packet (devices are specified according to the packet address).
  4. Non-kosher/Hacky fixes (for example, building a response packet that gets scheduled to be a response at the next Tick with dummy data) result in having the packet resent to the IOBus infinitely (basically, there is no easy fix and they all epically fail).

From what I have found on the mailing list archive, changing the memory configuration seems work, but there isn't a good explanation for why or when I should expect to see this error, nor how changing the configuration actually fixes the issue.

Is there any insight into why the IOBus in particular has this corruption?


Solution

  • The first thing to check would be the config.ini/config.json files from the output of the checkpoint and the output from the crash. What I have found is that, especially when using older checkpoints from different repos, the devices that are connected to the IOBus can change over time as the main gem5 branch develops.

    Likely, the reason why changing the configuration seems to work is that it creates a new checkpoint where the state of the devices attached to the IOBus is consistent between when the checkpoint is set and when the evaluation is run. It doesn't actually have anything to do with the new configuration, so you should feel free to re-make the checkpoint with the same memory configurations.