Search code examples

Confusing benchmarking results for StringBuilder vs StringBuffer with JUnit

I ran the following JUnit test case and was able to continuously get good performance results for Stringbuffer than StringBuilder. I'm sure that I'm missing something here but I could not find the reason behind why I get better speed for StringBuffer than StringBuilder.

My test case is,

    public void stringTest(){

        String s1 = "s1";
        String s2 = "s2";

        for(int i=0;i<=100000;i++){
            s1 = s1+s2;

    public void stringBuilderTest(){

        StringBuilder s1 = new StringBuilder("s1");
        String s2 = "s2";

        for(int i=0;i<=100000;i++){

    public void stringBufferTest(){

        StringBuffer s1 = new StringBuffer("s1");
        String s2 = "s2";

        for(int i=0;i<=100000;i++){

Please find the JUnit test results,

JUnit Test 1 JUnit Test 2

as you can see in the above results stringBufferTest case has executed sooner than stringBuilderTest case. My question is why is this? I know this is impossible theoretically but how I'm getting this result?


As per the @Henry's comment I removed the SysOuts and results got changed dramatically.

JUnit Test 3 - removing the SysOut

Then I increase the loop count by 100000 -> 1000000 and was able to get some realistic results which I expected all the time,

JUnit Test 3 - removing the SysOut for 1000000 iterations

Well my new questions are,

  1. Why I get a significant performance improvement when I remove my SysOut?
  2. When the load is increased low to high in 1000000 StringBuffer gives the best results over StringBuilder, why is that?


  • I'm afraid your benchmark is basically invalid.

    Microbenchmarks (little code, completing relatively quickly) are notoriously hard to get right in Java (and in many other languages, too). Some of the problems that make these benchmarks hard:

    • Depending on how the test is written, the compiler might optimize some code away, causing the benchmark to not measure what you wanted to measure.
    • Optimizations that occur at runtime (instead of compile time) often happen incrementally. Initially, the code is interpreted. Only when some parts of it are executed often enough will the Just in Time compiler generate optimized machine code. Microbenchmarks often do not trigger this optimization. has more infos about this topic.
    • Even if the code is eligible for optimization, depending on how it's written, the Just in Time compiler might not be able to patch it out completely, so it has to fall back to suboptimal optimizations. Search for "On stack replacement" for more details.

    This article from Oracle goes into more details on these problems:

    In the end, it comes down to this: unless you're very experienced in this, don't write microbenchmarks from scratch. The results you get have nothing to do with how that code would perform in a real application.

    Use a framework like JMH (Java Microbenchmark Harness) instead. The article I've linked above contains an intro to JMH, but there are other tutorials about it as well (e.g.

    Invest some time and learn to use JMH. It's worth it. When you run your benchmarks with JMH, you will see in its output how drastically the benchmark times change over time due to JVM optimizations taking place (JMH calls your tests multiple times).

    If you run your StringBuilder vs. StringBuffer tests in JMH, you should see that both classes perform just about the same on modern CPUs. StringBuilder is slightly faster, but not by that much. Still, I'd use StringBuilder, as it's slightly faster.