Search code examples
cmemory-managementstructlinked-listheap-corruption

Can reading garbage memory break program flow?


In my application, I have a nested pair of loops which follow similarly-nested linked lists in order to parse the data. I made a stupid blunder and cast one struct as the child struct, EG:

if (((ENTITY *) OuterEntityLoop->data)->visible == true) {

instead of:

if (((ENTITY_RECORD *) OuterEntityLoop->data)->entity->visible == true) {

This caused a problem where about 70% of runs would result in the application halting completely - not crashing, just sitting and spinning. Diagnostic printfs in program flow would fire in odd order or not at all, and though it spontaneously recovered a couple of times for the most part it broke the app.

So here's the thing. Even after paring down the logic inside to be absolutely it wasn't infinite looping based on a logic bug, to the point where the loop only contained my printf, it was still broken.

Thing two: when the struct was identified incorrectly, it still complained if I tried to access a nonexistent property even though it didn't have the extant property.

My questions are:

  1. Why did this corrupt memory? Can simply reading garbage memory trash the program's control structures? If not, does this mean I still have a leak somewhere even though Electric Fence doesn't complain anymore?
  2. I assume that the reason it complained about a nonexistent property is because it goes by the type definition given, not what's actually there. This is less questionable in my mind now that I've typed it out, but I'd like confirmation that I'm not off base here.

Solution

  • There's really no telling what will happen when a program accesses invalid memory, even for reading. On some systems, any memory read operation will either be valid or cause an immediate program crash, but on other systems it's possible that an erroneous read could be misinterpreted as a signal to do something. You didn't specify whether you're using a PC or an embedded system, but on embedded systems there are often many addresses by design which trigger various actions when they are read [e.g. dequeueing received data from a serial port, or acknowledging an interrupt]; an erroneous read of such an address might cause serial data to be lost, or might cause the interrupt controller to think an interrupt had been processed when it actually hadn't.

    Further, in some embedded systems, an attempt to read an invalid address may have other even worse effects that aren't really by design, but rather by happenstance. On one system I designed, for example, I had to interface a memory device which was a little slow to get off the bus following a read cycle. Provided that the next memory read was performed from a memory area which had at least one wait sate or was on a different bus, there would be no problem. If code which was running in the fast external memory partition tried to read that area, however, the failure of the memory device to get off the bus quickly would corrupt some bits of the next fetched instruction. The net effect of all this was that accessing the slow device from code located in some places was no problem, but accessing it--intentionally or not--from code located in the fast partition would cause weird and non-reproduceable failures.