Search code examples
x86simulationrtososdev

How can CAS simulators like PTLsim achieve cycle accurate simulation of x86 hardware?


Can somebody enlighten me how does CAS software like http://www.ptlsim.org/ work? How do they achieve cycle accuracy if there is neither information how many cycles are used per instruction nor CPU branch prediction logic is known? Or is everything available over NDAs?

I suppose they probably can be pretty accurate with caches being hit or missed (maybe I'm wrong but I think it's easier to predict the internal algorithm), but without knowledge how pipelining and superscalarity are implemented I'm confused how to make simulation cycle accurate.

Consequently, is it theoretically possible to implement hard rtos on x86 based hardware? Or one should sign some sort of NDA to be able to estimate required time per instruction?


Solution

  • First Question: How do they achieve cycle accuracy if there is neither information how many cycles are used per instruction nor CPU branch prediction logic is known?

    The simulator does provide a cycle accurate simulation for a sufficiently accurate CPU model but does not come with out-of-the box models for Intel's or AMD's current offerings. Someone at Intel or AMD with access to the required information could create a RTL level model and get cycle accurate simulations for current processors. People outside Intel and AMD cannot. You can still feed publically known information to the simulator and get reasonable results. These results will not be identical to the real hardware.

    If you are a software developer and want to benchmark real hardware, use real hardware! Simulators like PLTsim are designed for (academic) hardware developers who want to test new hardware features without spending hundreds of thousands of dollars on a new chip.

    Second Question: Is it theoretically possible to implement hard rtos on x86 based hardware?

    Of course it is theoretically possible. You would need to consider the absolute worst case for each code segment for all inputs under all circumstances. The practical problem is that processors like Core 2 are very complex and the state of the processor is enormous. Additionally these processors are not designed to behave deterministically with respect to timing. A really hard RTOS would have to be extremely conservative. Finally, as you correctly observe, people outside Intel and AMD don't have access to all the information required to make those conservative assumptions. In practice it is resonable to pass on the latest and greatest cpus and instead use older, simpler cpus that have a deterministic timing.

    On the other hand, if the RTOS does not have to be really hard real time, you can always just include some safety margin and hope for the best. ;-)