Search code examples
architecturep2pdistributed

P2p architecture needed


Let's say I want to implement a decentralized dropbox in a p2p architecture. For each folder, there will be N users sharing the files. So, I think all the files need to be stored in each peer (they need to see, open, write, create... files at any point in time). When, for example, a user changes the content of a file, he has to send a message to other peers warning them about the change, in order to maintain a coherence between all replicated files.

In this kind of problem, I don't understand why I should use a fashion p2p structured network (like Chord or so). Let's say we have a shared folder between three peers (A, B, C). If peerA changes a file and it warns only peerB (expecting that peerB will contact peerC), if peerB disconnects before sending the message to peerC, then peerA and peerC will have different files.

Isn't it easier that each peer stores a reference to all other peers and when it changes something it sends a message to each other (without expecting that other peers will do it for him)? What is the problem with this approach? Is there an architecture for those kind of things?


Solution

  • I don't understand why you think, that somebody wants you to use chord for it.

    There are different architectures for it, there is the torrent network, based on which original architecture the BitTorrent-Sync is written. It works differently from a torrent in the way that you don't have one file describing torrent files, but the client listens to changes on the local hard drive and updates the information about which files exist or not, but it works the same as a torrent later, when the files are distributed (you send half the file to one node and another half to another, and they can share those parts as well, this way you have to upload it only once and it's synchronised much faster).

    Another way to synchronise such a system is eventual consistency. There is a very good concept, the TSAE (timestamped anti-entropy) which can basically take care - without communicating all the time with each and every node - that eventually all of them will have the same state. In this case it doesn't matter who forwards the original message. However I don't know how well it would react to changes of nodes, as far as I understood it, the number of peers isn't allowed to change, and if peers say the retrieved a message, they aren't allowed to change the opinion about it anymore. You can read more about it in the dissertation project Weak-consistency group communication and membership by Richard Andrew Golding from 1992 (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.88.7385&rep=rep1&type=pdf) starting from chapter 5.

    And for the end - chord is meant for something totally different. Imagine you want to share files with thousands of peers, and each peer would contribute 10GB of files. 10GB for a modern computer is nothing, but >10TB (10GB * thousands of peers) is a lot. This is why you have chord, one node is responsible for a file and you can get the latest copy from this node, you also could ask for a lock from this node, ... And you wouldn't distribute it to all the other machines, but of course this master node would share it with 2, 3 or 10 other nodes (depending on your replication factor) to keep the files in the network, even when this node disconnects. This is how chord is meant to work, and it of course isn't the best solution, if you can communicate with all the peers at the same time and each of them can (or has to) hold the same information. But this will of course be limited quit early.