What are techniques for allowing safe software upgrades in embedded systems

Upgrading software for embedded devices often has the possibility of "bricking" the device, e.g. if power should happen to fail while in the midst of writing software to FLASH. Two questions:

What are some best practices for implementing the upgrade mechanism so as to minimize the probability that the device will be "bricked"?
What are some best practices for making the upgrade process fail-safe, so that events like power failures while installing software to FLASH can be recovered from?

Solution

It all depends on how critical the application is. The two basic approaches (backup and bootloader) are also combined sometimes.

Many systems have a read only bootloader (like redboot), and then two banks of flash memory (on the same chip, most often). The bootloader then has a flag to choose which bank to boot from. The flag will then change based on events like upgrades (failed or successful), and so on.

So, when upgrading, the running version copies the new load into the backup bank, checks the checksum, toggles the boot flag, and then reboots the device. The device reboots on the new bank, with the new load. After the reboot, the new load can copy itself into the backup bank.

Often there is also a watchdog timer with a hardware reset. This way, if the firmware goes insane, it fails to kick the watchdog, the hardware reset will reboot the device, and the bootloader will look for a sane load.

The Open Mesh project is a good example of this approach.