Search code examples
apache-storm

How storm enforce component memory constraint


Storm has the option to configure the memory size per component (bolt/spout) by using setMemoryLoad function . How does the worker process enforce this constraint per executor/tasks since they are all in the same JVM ?


Solution

  • I think you are misunderstanding what setMemoryLoad is for.

    SetMemoryLoad and similar methods (e.g. setCPULoad) don't determine how much memory or CPU is actually allocated to the component. They are hints for the resource aware scheduler https://storm.apache.org/releases/2.0.0-SNAPSHOT/Resource_Aware_Scheduler_overview.html.

    You configure how much memory/cpu is available on each supervisor, and the resource aware scheduler uses that information, along with the memory/cpu loads you set on you components, to try to distribute your components across your supervisors in a way that makes sense. You might want this if e.g. you want to distribute your heavy components evenly across you cluster, or if your supervisors are heterogeneous (that is, you have some weak machines and some strong machines and you don't want Storm to run very busy components on the weak machines).

    I believe Storm does have the ability to enforce the CPU/memory limit you specify, but you have to enable cgroups for that http://storm.apache.org/releases/2.0.0-SNAPSHOT/cgroups_in_storm.html. I'm not terribly familiar with that part of the code, but it allows you to set either a flat limit per worker JVM, or to try to set the limit dynamically based on the resource aware scheduler's assignment.

    As I said, I'm not familiar with the cgroups support, so this may be wrong, but I'd imagine Storm could set the cgroup limit for a worker JVM when booting that JVM. So for example, if the RAS assigns one instance of boltA and one instance of boltB to your worker, the entire worker would just have a combined limit of 512+1024 megs.