Search code examples
clinuxdata-synchronization

What happens if I turn off a device before "sync"?


I have an embedded device (Linux + BusyBox) on which I loaded a file "my_file.txt". I used cat to check the contents of the file, and it was satisfactory. Then, I pulled the plug and after reboot, I saw the file was still there, but with 0 bytes size...

Can that be caused by an unsync-ed file-system? This is actually a double-sided question:

  1. Do the creation of a file and the copy of its contents happen in different stages? (allowing a phase where a file with 0 bytes exists)
  2. Is it possible that I "see" the file [meaning I successfully managed to cat "my_file.txt"], but what I actually see, is a cached version that will not be there after reboot, unless a sync will be called? and by "not be there" I mean the contents, as the file itself remains

BTW, when does Linux flush filesystems? I know that stdout, for example, is flushed (by default) when a "\n" is introduced [and can be configured somehow, don't remember exactly how]. Is there a rule for filesystems as well?


Solution

  • Is the createion of a file and the copy of its contens happen in different stages? (allowing a phase where a file with 0 bytes exists)

    Yes. The normal operation on files are

    1. open/create the file
    2. read/write data.
    3. close the file.

    Is it possible that I "see" the file [meaning I successfully managed to cat "my_file.txt"], but what I actually see, is a cached version that will not be there after reboot, unless a sync will be called?

    Yes. If step 1 above was synced to the hard drive, but step 2 was not, you lose the file content.

    BTW, when does Linux flush filesystems? I know that stdout, for example, is flushed (by default) when a "\n" is introduced [and can be configured somehow, don't remember exactly how]. Is there a rule for filesystems as well?

    No, there are no general rules - it's complicated. The OS/Kernel and filesystem caches data in RAM and writes it to disk when its internal algorithms figure out it's a good time to do so.

    Note that there are flush/sync'ing on many levels. The flushing you talk about "when a "\n" is introduced", is only a flush from a program down to the operating system. The operating system might then keep the data in RAM only and flush it to a hard drive later. The hard drive might even cache it in RAM onboard the harddrive and write it to permanent storage later on.

    Normally you can run the sync command on a command line to ensure all cached data is written from the OS to the hard drive. (Albeit on low-end hard drives with onboard RAM that is not battery backed up, this could still lose data that resides in the ram onboard a harddrive if power is cut).