Search code examples
javafor-loopinteger-overflow

Why do large loops of calculations generate same output regardless of input?


I made these large loops of random operations to benchmark (just because I was curious) and ran into something I just do not understand. No matter what I input for the large loop, it always produces the same result. This is the code cut down to the part I am talking about.

public int stuff;
public int result;

myJavaProgram(String[] cmdArguments)
{ 
  stuff=1;
  superLoopCalcualtion();
  System.out.println(stuff+" converted to result:"+result); 

  stuff=12345;
  superLoopCalcualtion();
  System.out.println(stuff+" converted to result:"+result); 

  stuff=9823450;
  superLoopCalcualtion();
  System.out.println(stuff+" converted to result:"+result); 
}  

public void superLoopCalcualtion()
{
  int a=stuff;
  int b=a+99;   
  int z=0;
  for (z = 0; z < 200000; z++) 
  {
    a=a+22;
    b=a*44;
    a=b+1234;
  }
  result=a;
} 

And the output is this

1 converted to result:1398361394
12345 converted to result:1398361394
9823450 converted to result:1398361394

There is no way that could be right...right??


Solution

  • What you're seeing here is a (camouflaged) overflow that results in an effective multiplication factor being 0, thus effectively ending the value changes in the loop.

    Overflow in action

    To see this, let's consider a simplified code example using bytes which allows the loop to be shorter:

    byte a = stuff;
    for (byte z = 0; z < 8; z++) {
        a = (byte) (a * 2);
    }
    

    I omitted code that prints the numbers nicely in decimal and binary for each iteration. Here are the results for stuff being 1, 11 and 127 (Byte.MAX_VALUE):

    1
    ---
    0: 1        00000001
    1: 2        00000010
    2: 4        00000100
    3: 8        00001000
    4: 16       00010000
    5: 32       00100000
    6: 64       01000000
    7: -128     10000000
    8: 0        00000000
    
    11
    ---
    0: 11       00001011
    1: 22       00010110
    2: 44       00101100
    3: 88       01011000
    4: -80      10110000
    5: 96       01100000
    6: -64      11000000
    7: -128     10000000
    8: 0        00000000
    
    127
    ---
    0: 127      01111111
    1: -2       11111110
    2: -4       11111100
    3: -8       11111000
    4: -16      11110000
    5: -32      11100000
    6: -64      11000000
    7: -128     10000000
    8: 0        00000000
    

    To understand this, consider that multiplying a binary number by 2 adds a 0 from the right. If we do this continually, we "push" the numbers on the left outside of the range that our data structure can hold. For byte that range is 8 bits. Thus, after multiplying 8 times by 2 we guarantee that no matter what the number contained beforehand, it is now all 0s. Continuing has no effect, thus we reached stagnation (or staleness, whatever the term is).

    Pushing significant bits outside the "visible" range is called overflow, because the binary representation can't contain them and they... overflow. In decimal, this results in a change of the sign1. If you look at the example for 1, that overflow happens only at the last iteration because the number was small enough, that is equivalent to saying that it had a lot of free space on the right. 127 on the other hand overflows immediately as it is the maximal value for byte, that is, all the bits are needed.


    1 In Java all numbers are signed.

    Applying to your code

    From here, it's a matter of adding complexity step by step until we reach your code, but the underlying phenomenon is the same.

    For starters, we can increase our binary representation capacity by using short, int and long, but that just delays the inevitable. Instead of needing 8 iterations, we will need 12, 32 and 64, respectively.

    Next, we can change the multiplication factor. An even number is just multiplications of 2, so we will reach the same results. Note that with the special case of 2^n we will always reach the result faster because we are effectively just cutting down on iterations. However, with an odd number we will never reach (decimal) 0; the overflow will always skip it and we will reach our starting number again. Here is stuff = 1 (byte) with multiplication factor 3 for 64 (Byte.MAX_VALUE / 2 + 1) iterations:

    1
    ---
    0: 1        00000001
    1: 3        00000011
    2: 9        00001001
    3: 27       00011011
    4: 81       01010001
    5: -13      11110011
    6: -39      11011001
    7: -117     10001011
    8: -95      10100001
    9: -29      11100011
    10: -87     10101001
    11: -5      11111011
    12: -15     11110001
    13: -45     11010011
    14: 121     01111001
    15: 107     01101011
    16: 65      01000001
    17: -61     11000011
    18: 73      01001001
    19: -37     11011011
    20: -111    10010001
    21: -77     10110011
    22: 25      00011001
    23: 75      01001011
    24: -31     11100001
    25: -93     10100011
    26: -23     11101001
    27: -69     10111011
    28: 49      00110001
    29: -109    10010011
    30: -71     10111001
    31: 43      00101011
    32: -127    10000001
    33: -125    10000011
    34: -119    10001001
    35: -101    10011011
    36: -47     11010001
    37: 115     01110011
    38: 89      01011001
    39: 11      00001011
    40: 33      00100001
    41: 99      01100011
    42: 41      00101001
    43: 123     01111011
    44: 113     01110001
    45: 83      01010011
    46: -7      11111001
    47: -21     11101011
    48: -63     11000001
    49: 67      01000011
    50: -55     11001001
    51: 91      01011011
    52: 17      00010001
    53: 51      00110011
    54: -103    10011001
    55: -53     11001011
    56: 97      01100001
    57: 35      00100011
    58: 105     01101001
    59: 59      00111011
    60: -79     10110001
    61: 19      00010011
    62: 57      00111001
    63: -85     10101011
    64: 1       00000001
    

    I don't want to go into the bit math so much because I feel it's out of the scope of the question at this point. Suffice to say that on the MAX_VALUE / 2 + 1 iteration you will reach the starting number again (and for some numbers also before that).

    The point is that your 44 is even, so you get the stagnating result.

    Now to your addition operations. As much as you play with them, before and after the multiplication, it doesn't do more than change the result by a constant. The effect remains the same. Consider

    for (byte z = 0; z < 10; z++) {
        a = (byte) (a + 1);
        a = (byte) (a * 2);
    }
    

    The result is

    1
    ---
    0: 1        00000001
    1: 4        00000100
    2: 10       00001010
    3: 22       00010110
    4: 46       00101110
    5: 94       01011110
    6: -66      10111110
    7: 126      01111110
    8: -2       11111110
    9: -2       11111110
    10: -2      11111110
    

    So we are stagnating on -2. In decimal you can see this easily with the loop formula: (-2 + 1) * 2 = -2. Your "random" choice of numbers in the loop resulted (deterministically) in settling on the number 1398361394 after ~15 iterations (using longs will delay this result by some number of iterations). Just do the math iteration by iteration and you will reach a loop formula like the above.

    Conclusion time

    Be very careful of overflow! Be sure that the data structure you choose is always sufficient to contain the range of numbers you are working with. Worst case, you have the (non-primitive) type BigInteger for arbitrary precision (but it's much slower). Regardless of any of the parameters discussed above, once you overflow your mathematical result will be wrong (unless you are doing overflow math on purpose).