Which is the most efficient data structure or algorithm, which can be used for storing search engine data. Also which distributed file system could go with it?
Inverted Index
For more details refer open source Lucene and Nutch architecture.