Search code examples
clinuxcopytrimsparse-file

Linux/C: copying a file with the last zero-filled hole trimmed


I have a a software (C/C++/Linux) comprised of three services using a tmpfs file as shared memory. We ftruncate the file to 1GB for convenience, but the file will only contain in practice no more than 500MB of data, and thus more than 50% of the file is a big zero-filled hole at the end.

The software offers a tool called download, to allow the user to get its own offline copy of the file. When doing that copy, I don't want the user to see it occupies 1GB, but remove the "trailing hole" instead. We don't need to keep track of the "actual size" all the time, and I prefer to know or "deduce" the size only when the user ask to download it, which is a rather uncommon operation. I want the system to help me find out.

Any of the following approaches is enough for me:

  • Traversing a list of hole extracted from the metadata file and find the hole(s) which are the end and its size. That way I can calculate the actual size as 1GB - combined sizes of trailing holes.
  • A syscall that tells me the offset of the highest written byte and/or block?
  • (Least preferable) A terminal command that already does that for me (trimmed copy). It's least preferable because I don't like the use of system("blabla") calls, and I prefer to call C functions.

How can I do that? I prefer the use of system calls or libc, and if there's no comfortable way to do it, then the use of command-line tools as an alternative. Using third-party C/C++ libraries is not an option.


Solution

  • You can explore the holes in a file through a sequence of lseeks with SEEK_HOLE and SEEK_DATA flags which bring you to the next hole/not-hole.

    Those flags are available on Linux 3.1+, also some BSDs.