Search code examples
apachesolrhadoopsearch-engine

Solr with multicore, distributed architecture?


I'm planning to use Solr as search server, and will develop own spider or may extend Nutch.

I'm trying to design a best economic topology that serves my purpose for now and also remain open ended to be scaled out in future.

I'm planning to use Amazon AWS for hosting all machines. My question is to understand a feasibility of following idea and requirement, help would be appreciated!

  1. One Solr Node (Dedicated to serve queries only - as Query Server to web front end)
  2. On demand Solr Nodes (1 or many) (as index server - Nutch or other spiders would connect to this node and flood up with new content to crawl and index)

I'm not sure like many other search server (e.g. Microsoft FAST or SharePoint Search), I can deploy distributed topology with common database.

I'm willing to use Hadoop or any other distributed file system if that can support such topology.

So mainly it would visualize as following,

                  ---------------------------------------------------

                Hadoop or anyother distributed file system / db system

                  ---------------------------------------------------

                                           ||
                                           ||
                                           ||
                                           VV
                  ----------------                ------------------------

                  Solr query node                  Dedicated Solr index nodes 
                (1 powerful server)         +              (on demand)
                                                 with Nutch or other web spider

                  ----------------                ------------------------

                         ||                                   ||
                         VV                                   VV
                    Web Front End                          Internet       

I'm new to this technology, lots of community member on other forum and freelance website proposed multicore implementation, but my understanding is multicore is to support distinguish datanodes (nothing to do with clustering or distributed architecture)! Am I correct?

Please advise on feasibility!

Many thanks in advance.

Nilay.


Solution

  • "cores" in solr is used to describe an "fulltext-index environment". You can run 1 Java EE container (tomcat, ant, and so on) in order to provide different services with different databases and different fulltext-indexes. Example 1 Core for product-search, 1 core for mail-search, and so on.

    Every running Java EE container with solr has minimum one core. Looking at your topology it looks like you need one front-end solr-envorinment, probably 1 core and one backend solr-envoronment with probably also 1 core.

    So you have 2 Servers, 2 Java EE containers and 2 cores. You can see those 2 cores as "multi" (more than one) core, but in fact this are 2 single core installations, which used (probably) something like an replication. http://wiki.apache.org/solr/SolrReplication