Search code examples
concurrencyapache-stormthread-sleep

Should Storm spouts sleep() or yield()?


The Storm documentation for nextTuple() says the following:

When there are no tuples to emit, it is courteous to have nextTuple sleep for a short amount of time (like a single millisecond) so as not to waste too much CPU.

There seems to be a method for that in Utils.class: Utils.sleep(long millis).

However, in one of the spouts provided by Apache Storm itself, MqttSpout, a different approach is used:

public void nextTuple() {
    AckableMessage tm = this.incoming.poll();
    if(tm != null){
        ...
    } else {
        Thread.yield();
    }
}

I suspect that the Storm authors may have made a mistake there, since Thread.yield() itself has the following notes in the docs:

A hint to the scheduler that the current thread is willing to yield its current use of a processor. The scheduler is free to ignore this hint.

and

It is rarely appropriate to use this method.

So which one should I use? I suspect that using Thread.yield() would cause unnecessary CPU usage.


Solution

  • Your spout shouldn't sleep at all. Storm will handle sleeping between calls to nextTuple if you don't emit anything during a call, at least in the versions I'm familiar with, which are 1.0.0 and forward.

    See https://github.com/apache/storm/blob/v1.2.2/storm-core/src/clj/org/apache/storm/daemon/executor.clj#L667 for reference. The default implementation of the wait strategy sleeps for a configurable interval every time it is called (default 1ms). You can control the interval with https://github.com/apache/storm/blob/v1.2.2/storm-core/src/jvm/org/apache/storm/Config.java#L1886 or replace the wait strategy entirely with https://github.com/apache/storm/blob/v1.2.2/storm-core/src/jvm/org/apache/storm/Config.java#L1879.

    Storm 2.0.0 will have a slightly different behavior (progressively longer sleeps), but it's the same basic idea.

    I think the javadoc for nextTuple is misleading, so we should probably amend it. I'm also not sure what the Thread.yield is doing in the mqtt spout. It looks like it has been there since the spout was added. If you ask on one of the mailing lists (https://storm.apache.org/getting-help.html), the author is still around and might know why it's there.

    If you like, you can raise issues at https://issues.apache.org/jira/secure/Dashboard.jspa to address this :)