Search code examples
cmemoryhardwareramhardware-interface

Possible to detect bit errors in memory in software?


A friend and I were curious as to whether you could detect levels of ionizing radiation by looking at rates of single bit errors in memory. I did a little research and I guess most errors are caught and fixed at the hardware level. Would there be any way to detect errors in software (say, in c code on a pc)?


Solution

  • I'm sure it depends on the architecture you're running on, but I'm pretty certain you won't be detecting any single bit errors in your memory any time soon. Most if not all RAM controllers should have implemented some form of ECC protection to safeguard against the rare bit problems RAM chips have. DDR RAM, for example, is VERY reliable compared to crap mediums like flash memory, which will be spec'd to REQUIRE X number of bits of ECC protection (somewhere between 8 and 16 or so) before they guarantee functionality. As long as you have under a certain number of bit errors, the bad bits will be corrected and probably unreported before even reaching the CPU software level.

    Silent (Unreported) data corruption from something as simple as a single bit error is considered a huge "no-no" in the storage industry, so your memory manufacturer has probably done their darndest to prevent your application from seeing it, much less making you deal with it!

    In any case, one common way to detect problems in any sort of memory is to run simple write compare loops over the address space. Write 0's to all your memory and read it back to detect stuck '1' data lines, write-read-compare F's to memory to detect stuck '0' data lines, and run a data ramp to help detect addressing problems. The width of the data ramp should adjust according to the address size. (i.e. 0x00, 0x01, 0x02... or 0x0000, 0x0001, 0x0002, etc). You can easily do these types of things using storage performance benchmarking tools like Iometer or similar, although it may be just as easy to write yourself.