Search code examples
gitdistributed-system

what is the meaning of "Distributed" word in "Distributed Version Control System" like Git?


I have already read the question Distributed Version Control System to find the answer, but this question is different and arises while comparing "Distributed Version Control System" with other "Distributed Systems".

When I see word "Distributed" in other terms like "Distributed Database", "Distributed Cache" and "Distributed Computing" then I find that data or computing is really distributed over the network. Here the "Distributed" word means "Divided (equally not always)". For example in Hazelcast a "Distribute Cache system" the keys are really divided among provided nodes. But I do not find this similarity in "Distributed Version Control System".

  1. Does Distributed means Divided in "Distributed Version Control System"?
  2. If YES then what is Distributed (Because I do not see the division in code or commit history)?

Solution

  • With respect to version control systems, "distributed" is just the antonym for "centralized". A Centralized Version Control System has a single central or master server. It may or may not have additional servers, but if it does have additional servers and if those servers disagree with the designated central server, then those servers are wrong: the central server is the source of truth. In a distributed version control system, there is no such server: all repositories are peers, at least from a design point of view. Any system with a distributed design can, of course, be used as if it were centralized. One can use Git in this fashion by designating one of the Git repositories as the main repository for updates.

    When I see word "Distributed" in other terms like "Distributed Database", "Distributed Cache" and "Distributed Computing" then I find that data or computing is really distributed over the network. Here the "Distributed" word means "Divided (equally not always)". For example in Hazelcast a "Distribute Cache system" the keys are really divided among provided nodes. But I do not find this similarity in "Distributed Version Control System".

    Distributed databases with replication do not necessarily divide their storage. For instance, etcd uses the quorum system to elect a leader; all members of the quorum attempt to keep their copies of the data up to date. Cache behavior in multi-processor systems is often a form of distributed storage as well, though generally considerably more tightly coupled. See, e.g., the Wikipedia entry for cache coherence. Distributed systems with replication can be generally classified via consistency models.

    (A quick search suggests that Hazelcast has backups to handle node failure, so they must use some kind of consistency model as well. If some subset of data in a distributed system is stored solely on a single node, those data become unavailable if the node fails. Since the probability of failure generally increases with an increase in the number of nodes, this is usually not acceptable.)