Search code examples
databasefile-storage

How to efficiently store hundrets of thousands of documents?


I'm working on a system that will need to store a lot of documents (PDFs, Word files etc.) I'm using Solr/Lucene to search for revelant information extracted from those documents but I also need a place to store the original files so that they can be opened/downloaded by the users.

I was thinking about several possibilities:

  • file system - probably not that good idea to store 1m documents
  • sql database - but I won't need most of it's relational features as I need to store only the binary document and its id so this might not be the fastest solution
  • no-sql database - don't have any expierience with them so I'm not sure if they are any good either, there are also many of them so I don't know which one to pick

The storage I'm looking for should be:

  • fast
  • scallable
  • open-source (not crucial but nice to have)

Can you recommend what's the best way of storing those files will be in your opinion?


Solution

  • A filesystem -- as the name suggests -- is designed and optimised to store large numbers of files in an efficient and scalable way.