Search code examples
filesystemsgridfsglusterfsceph

Distributed File Systems: GridFS vs. GlusterFS vs Ceph vs HekaFS Benchmarks


I am currently searching for a good distributed file system.

It should:

  • be open-source
  • be horizontally scalable (replication and sharding)
  • have no single point of failure
  • have a relatively small footprint

Here are the four most promising candidates in my opinion:

The filesystem will be used mainly for media files (images and audio). There are very small as well as medium sized files (1 KB - 10 MB). The amount of files should be around several millions.

Are there any benchmarks regarding performance, CPU-load, memory-consumption and scalability? What are your experiences using these or other distributed filesystems?


Solution

  • I'm not sure your list is quite correct. It depends on what you mean by a file system.

    If you mean a file system that is mountable in an operating system and usable by any application that reads and writes files using POSIX calls, then GridFS doesn't really qualify. It is just how MongoDB stores BSON-formatted objects. It is an Object system rather than a File system.

    There is a project to make GridFS mountable, but it is a little weird because GridFS doesn't have concepts for things like hierarchical directories, although paths are allowed. Also, I'm not sure how distributed writes on gridfs-fuse would be.

    GlusterFS and Ceph are comparable and are distributed, replicable mountable file systems. You can read a comparison between the two here (and followup update of comparison), although keep in mind that the benchmarks are done by someone who is a little biased. You can also watch this debate on the topic.

    As for HekaFS, it is GlusterFS that is set up for cloud computing, adding encryption and multitenancy as well as an administrative UI.