I am perplexed as to how a file can be deleted (trashed) but still be linked by a process and still be written to. My understanding is that a file name is an entry in the directory that points to an inode. An inode is a data structure that lists the logical blocks that make up a file. When a file is deleted, its directory entry is deleted but the data and the inode are untouched. the kernel checks to see if there are any other links to the inode, and if not, the inode is deleted and the memory reclaimed. But how can such a nameless file be created? Whenever I open a file and read or write to it in python, it exists on the disk. But when I issue this command on my mac:
lsof +L1
I get a list of 300 files! From what I have read, these are files with only one link, the one to the process writing to them. If there were directory entries for these files as well, there would be at least two links. So, somewhere along the way, the processes writing to these files, deleted their directory entries? Why? How?
Open a file and then unlink it. Now you have an open file descriptor to a file with no link in a directory.
Unlinking the file can be accomplished programmatically using the unlink()
system call. From the command line, the commands rm
or unlink
can be used. These are effectively just a means to call the unlink()
system call.
Probably many such file descriptors that lsof
shows you come about because some program opened a file and then some other program replaced that file. There are several ways to write out a file. You can open the existing file, truncate its contents, and then write new contents. You can unlink the file and then open/create a new file and write new contents. Or you can write to a separate file, atomically swap the two directory entries to put the new file in the place of the old file, and then unlink the old file (i.e. with the rename()
or exchangedata()
system calls). The latter two approaches will leave any already opened file descriptors pointing to the old file's inode even though it's no longer linked from a directory.
For example, on my system lsof
shows many descriptors open to /private/var/folders/.../mds/mdsDirectory.db. That is probably related to Spotlight. There is a version of that file on disk, but not with the same inode. So, probably something opened it at time t0, something else wrote out a new version at time t1 and unlinked the old one, and I checked at time t2.