Search code examples
sql-servermongodbsharding

Conceptually, how does database sharding differ from a federation


Can anyone provide the key conceptual differences between:

  1. MongoDB Sharding
  2. SQL Server Federated instances.

They appear to be very similar but I don't know if I've missed anything major.

Thanks.


Solution

  • At a high level, they are similar.

    • In MongoDB you have a multiple "replica sets" and you "shard" the data across these sets for horizontal scalability.
    • In SQL Server you have use "replication" across servers and then provide a "partitioned view" across replicated servers to allow for horizontal scalability.

    The big differences are in the implementation and the technologies themselves. For example SQL Server supports joins and actually has some logic for joining across servers. MongoDB has no notion of a join and provides none of this logic.

    Both systems use some form of partition key for partitioning the data. SQL Server requires application-level logic for sending queries to the best node (colocating). MongoDB provides a router program mongos that will correctly route sharded queries without extra application logic.

    There's also the issue of balancing. MongoDB provides automatic balancing, so if one shard becomes overloaded with data, some of that data will be sent to other nodes. With SQL Server you have to manually define the partitions and the servers on which they reside.

    Obviously there are lots of other details, but that looks like the high level differences.