Search code examples
linuxionamed-pipesfuse

(Linux) Read mutliple files as a single one without having to copy the chunks to a new file first


(Linux) The problem at hand is the following:

Let's suppose we have foo_1 and foo_2 being in fact 2 chunks of the foo file, such as the command:

cat foo_1 foo_2 >foo

I would like a system to be able to consider {foo_1 + foo_2} as a single foo file without having to copy it first with the command above.

Depending on the command you use to read {foo_1 + foo_2], say you want a md5sum, you can just use named pipes, and it provides the feature.

You would do:

mkfifo my_named_pipe
cat foo_1 foo_2 >my_named_pipe &
md5sum my_named_pipe

That works!

But named pipes have a big limitation: all accesses must be sequential (no seek), since it is basically a pipe.

Hence this "named pipes" method is not a "generic read multiple files as a virtual single file".

Indeed that works in the example below for md5sum, because md5sum need only sequential reading of the file. Now if that file was say a rar file or a video you would like to read with VLC, or an ISO you would like to mount and do random access on, that will fail since those softwares need not-sequential reads.

Question: so, before calling the calvary, I mean writing myself a fuse filesystem that will do what I described above to save precious I/O and space, I would like to know if you have heard of a generic method to do so.

What I am thinking of is something looking like:

fuseVirtualFile mountpoint foo foo_1 foo_2

That would show the "virtual file" foo under mountpoint, so mountpoint/foo

This "virtual file" would be the read-only concatenation of foo_1 and foo_2, without having to actually do the write I/O which saves time, disk space, and wear on the SSD!


Solution

  • So, since it apparently didn't exist, I just create it!

    Behold: mfs

    This is a fuse filesystem that will do as asked by my question, which is "virtually merge several files into a single one".

    Then it becomes possible to access (read-only) the merged file as if it had actually been merged into a single file by a cat command.

    As already stated in the question, this is only useful if you need random read acces, since stream access can be done via named pipes.

    Here it is: https://github.com/Bylon/mfs

    Enjoy!