I've worked with Nutch 1x for crawling websites and using Elasticsearch to index the data. I've come across Storm-crawler recently and like it, especially the streaming nature of it.
Do I have to init and create the mappings for my ES server that Storm-crawler is sending the data to?
With Nutch, as long as I had the ES index up and running, the mapping took care of itself... except for some fine tuning. Is it the same for Stormcrawler? Or do I have to init the index and mapping before?
Great to hear you like StormCrawler.
As explained in README and the video tutorial based on ES2.x, you should use the ES_IndexInit script to set the mapping explicitly. It probably works without it but it would not be optimal.