Search code examples
c++cpu-architecturegem5

Skipping store queue searches in Gem5 when issuing loads causes crashes in guest


I'm modifying Gem5 (version 22.0.0.1) for a research project and one thing I'm testing is using special loads which when issued do not search the store queue for a potential store forward.

I've implemented this with a goto in LSQUnit::read​ in lsq_unit.cc​ from right before the loop that searches the store queue (starting line 1390) to the end of the loop, where the cache access is implemented (line 1568).

//The load_idx=isnt->lqIdx from the LSQ::pushRequest
Fault
LSQUnit::read(LSQRequest *request, ssize_t load_idx)
{
    LQEntry& load_entry = loadQueue[load_idx];
    const DynInstPtr& load_inst = load_entry.instruction();

    load_entry.setRequest(request);

    ...

    auto store_it = load_inst->sqIt;
    assert (store_it >= storeWBIt);
    /*=== my goto === */
    if (load_inst->isSpecbCheck()){
        goto cache;
    }
    /*=== store queue search loop ===*/
    // End once we've reached the top of the LSQ
    while (store_it != storeWBIt && !load_inst->isDataPrefetch()) {
        // Move the index to one younger
        store_it--;
        assert(store_it->valid());
        assert(store_it->instruction()->seqNum < load_inst->seqNum);
        int store_size = store_it->size();

        ...

    }

    /*=== code which performs cache access ===*/
    cache:

    // If there's no forwarding case, then go access memory
    DPRINTF(LSQUnit, "Doing memory access for inst [sn:%lli] PC %s\n",
            load_inst->seqNum, load_inst->pcState());

    ...
}


For some reason this change has caused Gem5 to crash fairly often. I'm running Spec2017 using simpoints and around 40% of generated checkpoints crash with the following error:

build/ARM/sim/faults.cc:102: panic: panic condition !handled && !tc->getSystemPtr()->trapToGdb(SIGSEGV, tc->contextId()) occurred: Page table fault when accessing virtual address 0x54000000540​

This only happens with checkpoints containing my new load instruction opcode, and not checkpoints without them (which are also otherwise identical). Furthermore these broken checkpoints run fine again as soon as this goto statement is removed.

Does anyone have any ideas why this could be happening? I've tried reading through the store forwarding code and can't figure out why skipping it altogether could cause any problems.

Thanks.


Solution

  • Did manage to solve this eventually so will share the problem:

    When a memory order violation occurs, it's detected by an executing store searching the load queue for older loads that share an address with the store. If a load is newer and shares an address, it will forward it's value from the store. However I was disabling this process, expecting the violation to be found later on at commit. It turns out though, in gem5 at least, the only check for this happens as the store is executed and nowhere else, so any loads that were skipping store queue searches but were really dependent on an issued store were using invalid values without causing any rollback!

    As such the fix to my problem was changing how this search for violations was done (or in my case specifically, still running the same store queue searching code for my special loads, pretending like I hadn't in the statistics, and then forcing a violation to occur if a matching address was found).

    Edit: Was suggested I share my forked gem5 for anyone who wants to see the code details: https://gitlab.com/muke101/pnd-loads/-/tree/main/gem5?ref_type=heads