Search code examples
apache-storm

Building history of storm worker assignments


I want to build a centralised history that looks like

timestamp : topology_name : component_name : topology_id : component_id : VM hostname : VM IP : Worker port

What would be the best to go about it in Storm? I can think of

  1. Reporting this from prepare() method of a spout/bolt
  2. Write a custom scheduler that reports the assignments

Solution

  • Reporting this from prepare() method of a spout/bolt

    This requires you to enforce a certain type of spout and bolt and you need to account for subclasses which don't call super.prepare, e.g. by making prepare final and make it call protected abstract prepare0 to enforce subclass logic there.

    Write a custom scheduler that reports the assignments

    That's what I'd do since it's more transparent for the spout and bolt registration and can be reused without any restrictions and incompatibilites. It's probably more complex and requires more insight into Storm internals.