Search code examples
riomemory-mapped-filesbufferbigdata

Testing whether buffers have been flushed in R


I have some big, big files that I work with and I use several different I/O functions to access them. The most common one is the bigmemory package.

When writing to the files, I've learned the hard way to flush output buffers, otherwise all bets are off on whether the data was saved. However, this can lead to some very long wait times while bigmemory does its thing (many minutes). I don't know why this happens - it doesn't always occur and it's not easily reproduced.

Is there some way to determine whether or not I/O buffers have been flushed in R, especially for bigmemory? If the operating system matters, then feel free to constrain the answer in that way.

If an answer can be generalized beyond bigmemory, that would be great, as I sometimes rely on other memory mapping functions or I/O streams.

If there are no good solutions to checking whether buffers have been flushed, are there cases in which it can be assumed that buffers have been flushed? I.e. besides using flush().

Update: I should clarify that these are all binary connections. @RichieCotton noted that isIncomplete(), though the help documentation only mentions text connections. It's not clear if that is usable for binary connections.


Solution

  • I'll put forward my own answer, but I welcome anything that is clearer.

    From what I've seen so far, the various connection functions, e.g. file, open, close, flush, isOpen, and isIncomplete (among others), are based on specific connection types, e.g. files, pipes, URLs, and a few other things.

    In contrast, bigmemory has its own connection type and the bigmemory object is an S4 object with a slot for a memory address for operating system buffers. Once placed there, the OS is in charge of flushing those buffers. Since it's an OS responsibility, then getting information on "dirty" buffers requires interacting with the OS, not with R.

    Thus, the answer for bigmemory is "no" as the data is stored in the kernel buffer, though it may be "yes" for other connections that are handled through STDIO (i.e. stored in "user space").

    For more insight on the OS / kernel side of things, see this question on SO; I am investigating a couple of programs (not just R + bigmemory) that are producing buffer flushing curiosities, and that thread helped to enlighten me about the kernel side of things.