I have a storm topology for which I do: setNumWorkers(1);
When I look at the storm UI report on this running topology, I see Num workers set to 1
.
However, when i log into the node running the supervisor I see two processes that have the same setting for -Dworker.id
and for -Dworker.port
.
I am including the output of what ps
shows me for these two processes below.
My question is: Why are there two processes that seem to be configured as worker processes if I only requested one (note: the Storm UI confirms that I have only one worker.)
This is important to me because when I do any profiling or analysis of what resources are being consumed by my topology, I want to know which process to zero in on.
ps output
root 787 20.0 0.6 5858228 78388 ? Sl 05:04 0:00 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -cp /opt/apache-storm-0.10.0/lib/log4j-slf4j-impl-2.1.jar:/opt/apache-storm-0.10.0/lib/servlet-api-2.5.jar:/opt/apache-storm-0.10.0/lib/clojure-1.6.0.jar:/opt/apache-storm-0.10.0/lib/slf4j-api-1.7.7.jar:/opt/apache-storm-0.10.0/lib/hadoop-auth-2.4.0.jar:/opt/apache-storm-0.10.0/lib/log4j-api-2.1.jar:/opt/apache-storm-0.10.0/lib/disruptor-2.10.4.jar:/opt/apache-storm-0.10.0/lib/storm-core-0.10.0.jar:/opt/apache-storm-0.10.0/lib/log4j-over-slf4j-1.6.6.jar:/opt/apache-storm-0.10.0/lib/log4j-core-2.1.jar:/opt/apache-storm-0.10.0/lib/asm-4.0.jar:/opt/apache-storm-0.10.0/lib/kryo-2.21.jar:/opt/apache-storm-0.10.0/lib/reflectasm-1.07-shaded.jar:/opt/apache-storm-0.10.0/lib/minlog-1.2.jar:/opt/apache-storm-0.10.0/conf:/opt/apache-storm-0.10.0/storm-local/supervisor/stormdist/big-storm-job-1-1487739502/stormjar.jar -Dlogfile.name=big-storm-job-1-1487739502-worker-6700.log -Dstorm.home=/opt/apache-storm-0.10.0 -Dstorm.id=big-storm-job-1-1487739502 -Dworker.id=e8e03e95-1fcc-492a-b5e4-51ef7b8db2ee -Dworker.port=6700 -Dstorm.log.dir=/opt/apache-storm-0.10.0/logs -Dlog4j.configurationFile=/opt/apache-storm-0.10.0/log4j2/worker.xml backtype.storm.LogWriter /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx768m -Djava.library.path=/opt/apache-storm-0.10.0/storm-local/supervisor/stormdist/big-storm-job-1-1487739502/resources/Linux-amd64:/opt/apache-storm-0.10.0/storm-local/supervisor/stormdist/big-storm-job-1-1487739502/resources:/usr/local/lib:/opt/local/lib:/usr/lib -Dlogfile.name=big-storm-job-1-1487739502-worker-6700.log -Dstorm.home=/opt/apache-storm-0.10.0 -Dstorm.conf.file= -Dstorm.options= -Dstorm.log.dir=/opt/apache-storm-0.10.0/logs -Dlogging.sensitivity=S3 -Dlog4j.configurationFile=/opt/apache-storm-0.10.0/log4j2/worker.xml -Dstorm.id=big-storm-job-1-1487739502 -Dworker.id=e8e03e95-1fcc-492a-b5e4-51ef7b8db2ee -Dworker.port=6700 -cp /opt/apache-storm-0.10.0/lib/log4j-slf4j-impl-2.1.jar:/opt/apache-storm-0.10.0/lib/servlet-api-2.5.jar:/opt/apache-storm-0.10.0/lib/clojure-1.6.0.jar:/opt/apache-storm-0.10.0/lib/slf4j-api-1.7.7.jar:/opt/apache-storm-0.10.0/lib/hadoop-auth-2.4.0.jar:/opt/apache-storm-0.10.0/lib/log4j-api-2.1.jar:/opt/apache-storm-0.10.0/lib/disruptor-2.10.4.jar:/opt/apache-storm-0.10.0/lib/storm-core-0.10.0.jar:/opt/apache-storm-0.10.0/lib/log4j-over-slf4j-1.6.6.jar:/opt/apache-storm-0.10.0/lib/log4j-core-2.1.jar:/opt/apache-storm-0.10.0/lib/asm-4.0.jar:/opt/apache-storm-0.10.0/lib/kryo-2.21.jar:/opt/apache-storm-0.10.0/lib/reflectasm-1.07-shaded.jar:/opt/apache-storm-0.10.0/lib/minlog-1.2.jar:/opt/apache-storm-0.10.0/conf:/opt/apache-storm-0.10.0/storm-local/supervisor/stormdist/big-storm-job-1-1487739502/stormjar.jar backtype.storm.daemon.worker big-storm-job-1-1487739502 8fde2226-4b32-406d-8809-81ed88e5ae1f 6700 e8e03e95-1fcc-492a-b5e4-51ef7b8db2ee
root 805 203 2.0 4308648 255336 ? Sl 05:04 0:06 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx768m -Djava.library.path=/opt/apache-storm-0.10.0/storm-local/supervisor/stormdist/big-storm-job-1-1487739502/resources/Linux-amd64:/opt/apache-storm-0.10.0/storm-local/supervisor/stormdist/big-storm-job-1-1487739502/resources:/usr/local/lib:/opt/local/lib:/usr/lib -Dlogfile.name=big-storm-job-1-1487739502-worker-6700.log -Dstorm.home=/opt/apache-storm-0.10.0 -Dstorm.conf.file= -Dstorm.options= -Dstorm.log.dir=/opt/apache-storm-0.10.0/logs -Dlogging.sensitivity=S3 -Dlog4j.configurationFile=/opt/apache-storm-0.10.0/log4j2/worker.xml -Dstorm.id=big-storm-job-1-1487739502 -Dworker.id=e8e03e95-1fcc-492a-b5e4-51ef7b8db2ee -Dworker.port=6700 -cp /opt/apache-storm-0.10.0/lib/log4j-slf4j-impl-2.1.jar:/opt/apache-storm-0.10.0/lib/servlet-api-2.5.jar:/opt/apache-storm-0.10.0/lib/clojure-1.6.0.jar:/opt/apache-storm-0.10.0/lib/slf4j-api-1.7.7.jar:/opt/apache-storm-0.10.0/lib/hadoop-auth-2.4.0.jar:/opt/apache-storm-0.10.0/lib/log4j-api-2.1.jar:/opt/apache-storm-0.10.0/lib/disruptor-2.10.4.jar:/opt/apache-storm-0.10.0/lib/storm-core-0.10.0.jar:/opt/apache-storm-0.10.0/lib/log4j-over-slf4j-1.6.6.jar:/opt/apache-storm-0.10.0/lib/log4j-core-2.1.jar:/opt/apache-storm-0.10.0/lib/asm-4.0.jar:/opt/apache-storm-0.10.0/lib/kryo-2.21.jar:/opt/apache-storm-0.10.0/lib/reflectasm-1.07-shaded.jar:/opt/apache-storm-0.10.0/lib/minlog-1.2.jar:/opt/apache-storm-0.10.0/conf:/opt/apache-storm-0.10.0/storm-local/supervisor/stormdist/big-storm-job-1-1487739502/stormjar.jar backtype.storm.daemon.worker big-storm-job-1-1487739502 8fde2226-4b32-406d-8809-81ed88e5ae1f 6700 e8e03e95-1fcc-492a-b5e4-51ef7b8db2ee
Just in case it is helpful for someone reading this trying to get a better picture of my environment, here is my docker configuration for Storm (and other stuff). Hopefully that helps.
version: '2'
services:
zookeeper:
image: wurstmeister/zookeeper
container_name: zk
hostname: zk
ports:
- "2181:2181"
networks:
storm:
kafka:
image: wurstmeister/kafka:0.8.2.2-1
container_name: kafka
hostname: kafka
ports:
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ADVERTISED_HOST_NAME: 10.211.55.4
KAFKA_ZOOKEEPER_CONNECT: 10.211.55.4
volumes:
- /var/run/docker.sock:/var/run/docker.sock
nimbus:
image: sunside/storm-nimbus
container_name: storm-nimbus
hostname: storm-nimbus
ports:
- "49773:49772"
- "49772:49773"
- "49627:49627"
environment:
- "LOCAL_HOSTNAME=nimbus"
- "ZOOKEEPER_ADDRESS=zk"
- "ZOOKEEPER_PORT=2181"
- "NIMBUS_ADDRESS=nimbus"
- "NIMBUS_THRIFT_PORT=49627"
- "DRPC_PORT=49772"
- "DRPCI_PORT=49773"
volumes:
- /media/psf/Home/dev/storm-pipeline:/pipeline
networks:
storm:
supervisor:
image: sunside/storm-supervisor
container_name: storm-supervisor
hostname: storm-supervisor
ports:
- "8000:8000"
environment:
- "LOCAL_HOSTNAME=supervisor"
- "NIMBUS_ADDRESS=nimbus"
- "NIMBUS_THRIFT_PORT=49627"
- "DRPC_PORT=49772"
- "DRPCI_PORT=49773"
- "ZOOKEEPER_ADDRESS=zk"
- "ZOOKEEPER_PORT=2181"
networks:
storm:
ui:
image: sunside/storm-ui
container_name: storm-ui
hostname: storm-ui
ports:
- "8888:8080"
environment:
- "LOCAL_HOSTNAME=ui"
- "NIMBUS_ADDRESS=nimbus"
- "NIMBUS_THRIFT_PORT=49627"
- "DRPC_PORT=49772"
- "DPRCI_PORT=49773"
- "ZOOKEEPER_ADDRESS=zk"
- "ZOOKEEPER_PORT=2181"
networks:
storm:
elasticsearch:
image: elasticsearch:2.3
container_name: elasticsearch
hostname: elasticsearch
ports:
- "9200:9200"
networks:
storm:
networks:
storm:
external: true
The answer to the mystery is that one of the process of the two is a real 'worker' process (the class that is being executed is backtype.storm.daemon.worker)... The other process that was printed out in response to the 'ps' command was a log writer process, executed by the class backtype.storm.LogWriter.
I should have noticed this in the output lines for the two processes. Oh well.... now we know !