Search code examples
distributedfsmicrosoft-distributed-file-system

How to create distributed file system


Just for self education I decided to implement "hello world" distributed file system. The simplest one. And decide to read about theory under this subject. But... when I asking google about this it shows answers like "how to configure hdfs" or "how to set distributed fs on windows" what is not what I interested in...

Could someone please point me on some good articles or books on this subject. Thanks a lot!


Solution

  • Well, if you really decided to implement such a file system, you must start with distributed systems. I recommend reading the Tanenbaum reference book http://www.distributed-systems.net/index.php?id=distributed-systems-principles-and-paradigms

    Careful, the subject is really complex and distributed systems are all but simple to implement.

    If you want to have a look to some already implemented distributed file systems, you may have a look to GFS/GFS2 (from RedHat). You may also have a look to ocfs2 from oracle. You may also have a look to gluster https://fr.wikipedia.org/wiki/GlusterFS

    You may also be able to find some white papers on the google file system (when it was still a university work).

    The main problem of such distributed system is the failure detection (detect when a node crashes while writing on the file system => need to make sure there are no corruptions). There are multiple strategy, one may be to implement a journal which is protected by a distributed lock.

    Another great (classical) problem is the 'split brain' problem, when the cluster is split in two groups because of a network failure (imagine a switch that is broken). Both groups 'think' that the other one is dead (they cannot communicate with it) but there is no way to make sure that the distant group is not writing data causing the data to diverge.

    Hope you find what you want with all this.

    Edit: Now GFS is deprecated, redhat is using and developing 'Ceph'