I'm writing a GUI program that synchronizes files in a folder with a server. The information I know about these files is that they're always written and not removed. My concern is to start uploading a file while it's being written. So to avoid this, I invented a way to solve the problem, and I need some expert to tell me whether this is wrong.
So what I do is that I have an event loop with a timer. Every time this timer ticks, it looks whether there are new files added. If new files are found, I use this simple function to get the file size:
std::size_t GetFileSize(const std::string &filename)
{
std::ifstream file(filename.c_str(), std::ios::binary | std::ios::ate);
return file.tellg();
}
Then, I store the new file(s) name, size in a data structure of the form (ignoring std::
to make it visually friendly as there are 5 to be written in the next line):
deque<pair<string, pair<size_t, long> > fileMonitor;
(please suggest a better data structure if possible. unordered_multimap
seems to do a similar job).
So this will store the file name (in that string), its size (in that size_t) and the number of times the size of the file was checked without a change, let's call it checks
. So every time the timer ticks, I look for new files, and check whether the size of the files in fileMonitor
has changed. For a single file, if the file size is different than before, then checks = 1
, and if the file size is the same, then I do checks++
.
Now in each iteration, I check if the the timer's interval*checks > timeout
, then the file hasn't change for a long enough time, where I can judge that the file is stable and not being updated.
Obvious question: Why don't I use something like inotify
? Because I need something cross platform and simple in structure, as I already know the behavior of the files I'm gonna upload. Unfortunately boost doesn't provide a solution for this, so I had to invent my own.
Do you have access to the writing program ? In that case I would recommend to first write the data into a temporary file and only rename it after writing has been finished (kind of an atomic
operation on a file system). Otherwise your "wait an appropriately long time for a change" approach always has the potential to fail because you can not tell what might be the reason for the writing program to not change the file for a long time.
Files may even change content without changing its size but:
From the https://www.hdfgroup.org/HDF5/doc/H5.format.html#FileMetaData
File Consistency Flags
This value contains flags to indicate information about the consistency of the information contained within the file. Currently, the following bit flags are defined:
Bit 0 set indicates that the file is opened for write-access. Bit 1 set indicates that the file has been verified for consistency and is guaranteed to be consistent with the format defined
in this document. Bits 2-31 are reserved for future use.
Bit 0 should be set as the first action when a file is opened for write access and should be cleared only as the final action when closing a file. Bit 1 should be cleared during normal access to a file and only set after the file's consistency is guaranteed by the library or a consistency utility.
I would assume that hd5 APIs provide methods to exclusively open these files and would try it in addition to your polling approach.