Search code examples
database-designdistributedlarge-data

Extremely large database design


Suppose I am an Architect who just got a new task to design and implement a storage solution.

Each database can only stores 1TB data. However the total size of data is 10TB. What will be the typical choices to design such a database system, in a query efficient way? And what is the strategy for CRUD operations in each case?


Solution

  • This sounds like a homework question, but it's a trick question, because it cannot be answered given the information you have described.

    The strategy is the same as for a somewhat smaller database, but the following prerequisite is even more important at the scale of terrabytes: you must first know what those queries are before you can choose an optimization strategy for them.

    "CRUD" doesn't describe the queries enough to optimize for them.

    Once you know the queries, you can use one or more of the following optimization methods:

    • Indexing
    • Denormalization
    • Partitioning
    • Sharding
    • Caching
    • Specialized data stores