Search code examples
apacheconfigurationweb-crawlernutch

How to change configuration of apache nutch when it is crawling


My crawler (apache nutch2.2.1) is in crawling state. I have to change some configurations of crawler in nutch-site.xml. I have come to know that when crawler is in running state, avoid to change configuration.

My question is.

  1. Can we change configurations of crawler in running state?
  2. If yes then is there any cations when doing some changes in crawler?
  3. or If we could not change configuration of crawler, then what are its drawbacks if configurations are changed?

Solution

  • Nutch 2.2.1 crawling is a loop of Hadoop jobs, we can change the configuration of the Nutch crawler during runtime, however the changing only is activated in the next Hadoop job. For example, if you change the configuration during generating job, the changing is activated in fetching job.

    Hope this helps,

    Le Quoc Do