I was experimenting some crawl cycles with nutch and would like to setup a distributed crawl environment. But I wonder how can I trigger nutch for incoming crawl requests in a production system. I read about nutch REST api. Is that the real option that I have ? Or can I run nutch as a continuously running distributed server by any other option ?
My preferred nutch version is nutch 1.12.
As sujen stated, there are two options for this :-
How to run nutch server on distributed environment