algorithm data-structures save large-data-volumes

I have 100 trillion elements, each of them has size from 1 byte to 1 trillion bytes (0.909 TiB). How to store them and access them very efficiently?

This is an interview question :

Suppose: I have 100 trillion elements, each of them has size from 1 byte to 1 trillion bytes (0.909 TiB). How to store them and access them very efficiently ?

My ideas : They want to test the knowledge about handling large volume of data efficiently. It is not an only-one-correct-answer question.

Save them into some special data structure ?

Actually I have no ideas about this kind of open-end question.

Any help is really appreciated.

Solution

It really depends on the data-set in question. I think the point is for you to discuss the alternatives and describe the various pros/cons.

Perhaps you should answer their question with more questions!

How will it need to be accessed? (sequentially, randomly, some predictable distribution?)
Is the order of elements important?
Will the size of elements change?
How important is insert/remove performance?

The data structure you choose will depend on what kinds of trade-offs you are willing to make.

For example, if you only ever need to iterate over the set sequentially, perhaps you could should use a linked list as this it has a relatively small storage overhead.

If instead you need random access, you might want to look into:

Hash-tables (constant time lookup, but need a good hash function for the data)
Some kind of index / tree structure?
Caching! You probably won't be able to keep it all in memory - and even if you can you want to take advantage of data locality where possible.

TL;DR: It's all problem dependent. There are many alternatives.

This is essentially the same problem faced by file systems / databases.