Search code examples
c++timermonitorfilesizestability

C++: Monitor file size. Can this be problematic?


I'm writing a GUI program that synchronizes files in a folder with a server. The information I know about these files is that they're always written and not removed. My concern is to start uploading a file while it's being written. So to avoid this, I invented a way to solve the problem, and I need some expert to tell me whether this is wrong.

So what I do is that I have an event loop with a timer. Every time this timer ticks, it looks whether there are new files added. If new files are found, I use this simple function to get the file size:

std::size_t GetFileSize(const std::string &filename)
{
    std::ifstream file(filename.c_str(), std::ios::binary | std::ios::ate);
    return file.tellg();
}

Then, I store the new file(s) name, size in a data structure of the form (ignoring std:: to make it visually friendly as there are 5 to be written in the next line):

deque<pair<string, pair<size_t, long> > fileMonitor;

(please suggest a better data structure if possible. unordered_multimap seems to do a similar job).

So this will store the file name (in that string), its size (in that size_t) and the number of times the size of the file was checked without a change, let's call it checks. So every time the timer ticks, I look for new files, and check whether the size of the files in fileMonitor has changed. For a single file, if the file size is different than before, then checks = 1, and if the file size is the same, then I do checks++.

Now in each iteration, I check if the the timer's interval*checks > timeout, then the file hasn't change for a long enough time, where I can judge that the file is stable and not being updated.

Obvious question: Why don't I use something like inotify? Because I need something cross platform and simple in structure, as I already know the behavior of the files I'm gonna upload. Unfortunately boost doesn't provide a solution for this, so I had to invent my own.


Solution

  • Do you have access to the writing program ? In that case I would recommend to first write the data into a temporary file and only rename it after writing has been finished (kind of an atomic operation on a file system). Otherwise your "wait an appropriately long time for a change" approach always has the potential to fail because you can not tell what might be the reason for the writing program to not change the file for a long time.

    • Additions for HD5 Format:

    Files may even change content without changing its size but:

    From the https://www.hdfgroup.org/HDF5/doc/H5.format.html#FileMetaData

    File Consistency Flags

    This value contains flags to indicate information about the consistency of the information contained within the file. Currently, the following bit flags are defined:

    Bit 0 set indicates that the file is opened for write-access.
    Bit 1 set indicates that the file has been verified for consistency and is guaranteed to be consistent with the format defined
    

    in this document. Bits 2-31 are reserved for future use.

    Bit 0 should be set as the first action when a file is opened for write access and should be cleared only as the final action when closing a file. Bit 1 should be cleared during normal access to a file and only set after the file's consistency is guaranteed by the library or a consistency utility.

    I would assume that hd5 APIs provide methods to exclusively open these files and would try it in addition to your polling approach.