Hystrix CircuitBreakerSleepWindowInMilliseconds doesn't work as expected

I am testing Hystrix CircuitBreaker implementation. This is how command class looks like:

public class CommandOne extends HystrixCommand<String>
{
    private MyExternalService service;    
    public static int runCount = 0;

    public CommandGetPunterUnpayoutExternalBets(MyExternalServoce service)
    {
        super(Setter.withGroupKey(HystrixCommandGroupKey.Factory.asKey("AAA"))
                .andThreadPoolPropertiesDefaults(
                        HystrixThreadPoolProperties.Setter().
                         .withMetricsRollingStatisticalWindowInMilliseconds(10000))
                .andCommandPropertiesDefaults(HystrixCommandProperties.Setter()
                        .withCircuitBreakerEnabled(true)
                        .withCircuitBreakerErrorThresholdPercentage(20)
                        .withCircuitBreakerRequestVolumeThreshold(10)
                        .withExecutionTimeoutInMilliseconds(30)
                        .withCircuitBreakerSleepWindowInMilliseconds(100000)));

        this.service = service;
    }


    @Override
    protected String run()
    {
        run++;
        return service.callMethod();
    }


    @Override
    protected String getFallback()
    {
        return "default;
    }
}

Command is called like this:

public class AnotherClass
{
    private MyExternalServoce service; 

    public String callCmd()
    {
        CommandOne command = new CommandOne(service);
        return command.execute();
    }
}

In test I perform next steps:

@Test
    public void test()
{
    AnotherClass anotherClass = new AnotherClass();

    // stubbing exception on my service
    when(service.callMethod()).thenThrow(new RuntimeException());
    for (int i = 0; i < 1000; i++)
        {
             anotherClass.callCmd();
        }
    System.out.println("Run method was called times = " + CommandOne.runCount);
}

What I expect with the configuration of command given: MyExternalService.callMethod() should be called 10 times (RequestVolumeThreshold) and after that not being called 100000 ms (long time). In my test case I expect that CommandOne.runCount = 10. But in reality I am getting from 150 to 200 calls of MyExternalService.callMethod() (CommandOne.runCount = (150-200). Why does it happening? What I did wrong?

Solution

According to Hystrix docs health snapshot will be taken once per 500ms ( by default ). Which means that everything what happens with hystrix during first 500ms will not affect circuit breaker status. In your example you got random value of runCount because each time your machine executed random value of requests per 500 ms, and only after that time interval circuit state was updated and closed.

Please take a look on a bit simplified example:

 public class CommandOne extends HystrixCommand<String> {

    private String content;
    public static int runCount = 0;


    public CommandOne(String s) {
        super(Setter.withGroupKey
                (HystrixCommandGroupKey.Factory.asKey("SnapshotIntervalTest"))
                .andCommandPropertiesDefaults(
                        HystrixCommandProperties.Setter()
                                .withCircuitBreakerSleepWindowInMilliseconds(500000)
                                .withCircuitBreakerRequestVolumeThreshold(9)
                                .withMetricsHealthSnapshotIntervalInMilliseconds(50)
                                .withMetricsRollingStatisticalWindowInMilliseconds(100000)
                )
        );
        this.content = s;
    }

    @Override
    public String run() throws Exception {
        Thread.sleep(100);
        runCount++;
        if ("".equals(content)) {
            throw new Exception();
        }
        return content;
    }

    @Override
    protected String getFallback() {
        return "FAILURE-" + content;
    }

}

    @Test
    void test() {

        for (int i = 0; i < 100; i++) {
            CommandOne commandOne = new CommandOne();
            commandOne.execute();
        }
        Assertions.assertEquals(10, CommandOne.runCount);
    }

In this example I've added:

withMetricsHealthSnapshotIntervalInMilliseconds(50) to allow hystrix to take snapshots each 50ms.
Thread.sleep(100); to make requests a bit slower, without it they will be faster then 50 ms and we will face initial issue.

Despite of all these modifications I've seen some random failures. After this I came to conclusion that testing hystrix like this is not a good idea. Instead of it we could use:

1) Fallback/Success flow behavior by manually setting open/close circuit state.

2) Configuration tests