Search code examples
pythonfilesystemsgoogle-docs-apifuse

Limiting file explorer mini-reads


I'm implementing a FUSE driver for Google Drive. The aim is to allow a user to mount her Google Drive/Docs account as a virtual filesystem. Full source at https://github.com/jforberg/drivefs. I use the fusepy bindings to integrate FUSE with Python, and Google's Document List API to access Drive.

My driver is complete to the degree that readdir(2), stat(2) and read(2) work as expected. In the filesystem, each file read translates to a HTTPS request which has a large overhead. I've managed to limit the overhead by forcing a larger buffer size for reads.

Now to my problem. File explorers like Thunar and Nautilus build thumbs and determine file types by reading the first part of each file (the first 4k bytes or so). But in my filessystem, reading from many files at once is a painful procedure, and getting a file listing in thunar takes a very long time compared with a simple ls (which only stat(2)s each file).

I need some way to tell file explorers that my filessystem does not play well with "mini-reads", or some way to identify these mini-reads and feed them made-up data to make them happy. Any help would be appreciated!

EDIT: The problem was not with HTTPS overhead, but with my handling of Google's native "doc" format. I added a line to make read(2) return an empty string when someone tries to read a native doc, and the file listing is now almost instantaneous.

This seems a mild limitation, as not even Google's official client program is able to edit native docs.


Solution

  • Here is pycloudfuse which is a similar attempt but for cloud files / openstack object storage which you might find useful bits in.

    When writing this I can't say I noticed any problems with Thunar and Nautilus with the directory listings.

    I don't think you can feed the file managers made up data - that is bound to lead to problems.

    I like the option is to signal to the file explorer not to do thumbnails etc, but I don't think that is possible either.

    I think the best option is to remind your users that drivefs is not a real filesystem, and to give a list of its limitations, and if it is anything like pycloudfuse there will be lots!