Search code examples
node.jssearch-engine

Creating an online search engine for large XML database (10GB - 1TB of data) with server running


I have been using Node.js to create a website that will eventually be able to search Google Patent Grant database that provides data in XML format. I've been using MongoDB for the User database, but someone told me that they had a lot of difficulty creating a fast search engine using MongoDB, they also said that it grew very large. What database technology/software should I use in conjunction with Node.js to create an efficient search engine? Would it be a bad idea to have two different database technologies running for one website, e.g. MongoDB and PostgreSQL? I found a technology called Norch on github https://github.com/fergiemcdowall/norch . Would this technology be helpful?


Solution

  • You are going to have hard time matching or beating lucene in text search with either Postgres or mongodb. Thus Solr or Elasticsearch are better options (they both use lucene).

    That being said most people still store their data in something other than the search index and thus implement some kind of synchronization between the search index and data repository.

    Edit based on comment:

    An example combination would be Solr and Postgres. Solr would be your search engine and Postgres would be your data repository. You could then use the DataImportHandler to pull the data from Postgres.