Search code examples
apache-stormstormcrawler

Is there any systematic way to turn on or turn off some Bolt in StormCrawler?


I have developed a StormCrawler project that has multiple additional Bolts in that topology. My crawler should work 7 X 24 without any downtime. So I can not restart the crawler and change the topology configuration. I want to bypass (turn on or turn off) some bolts during runtime. What is the best way to disable and enable some Bolts in StormCrawler at runtime?

thanks


Solution

  • There is no way of doing it out of the box, so you'd have to implement the logic for tuning the bolts on / off in the bolts themselves.

    If what you need is to refresh their configuration, you could implement a dynamic mechanism. For instance, store the config of the bolts in for instance an Elastic index and reload that config periodically.

    We already have something a bit like this with the JSONURLFilterWrapper and the equivalent for the ParseFilter. We could have an abstract ES-backed dynamically configurable bolt. Feel free to open an issue on GitHub if you think this is of interest or even better, contribute a PR ;-)