Search code examples
sqlnosqlscalabilityshardingdistributed-database

Distributed Database Computing - Is it really possible within the RDBMS paradigm?


I am asking this in the context of NoSQL - which achieves scalability and performance without being expensive.

So, if I needed to achieve massively parallel distributed computing across databases ... What are the various methodologies available today (within the RDBMS paradigm) to achieve distributed computing with high-scalability?

Does database clustering & mirroring contribute in any way towards distributed computing?


Solution

  • I guess you are asking about scalability of RDBMS databases. Talking about NoSQL databases based on ( amazon dynamo, BigTable ) are a whole another topic. I am talking about HBase, Cassandra etc. There are also commerical products like Oracle Coherence thats more like a distributed cache and key value store , to put it crudely.

    going back to rdbms,

    Sharding to scale RDBMS one can do cusstom sharding. Sharding is a technique where you have multiple table is possibly multiple hosts. And then you decide in a certain fashion to assign certain rows to certain tables. For example you can say that rows 1-1M goes to table1, 1M-2M goes to table2 etc. But, this is a difficult process from an administration point of view. A lot of large scale websites scale by relying on sharding. Other techniques worth mentioning are partioning and mysql federation and mysql cluster.

    MPP databases Then there are databases are there very RDBMS which does distribution and scaling for you. Terradata is the most successful of these companies. I believe they used postgres core code at some point. A significant number of fortune 500 companies and a lot of the airlines use Terradata. But, its ridiculously expensive. There are newer companies like greenplum, vertica, netezza.