Search code examples
javarandomconfidence-intervalapache-commons-math

Confidence intervals in java, testing the random pick of an element in a list of objects


So I have this method that picks at random an object from a list of 2 objects. I would like to write a junit test (@Test) asserting based on a confidence level that somehow there's a 50% chance for each of the 2 objects to be picked.

The piece of code under test:

public MySepecialObj pickTheValue(List<MySepecialObj> objs, Random shufflingFactor) {

    // this could probably be done in a more efficient way
    // but my point is asserting on the 50% chance of the 
    // two objects inside the input list
    Collections.shuffle(objs, shufflingFactor);
    return objs.get(0);
}

In the test I would like to provide 2 mocks (firstMySepecialObjMock and secondMySepecialObjMock) as input objects of type MySepecialObj and new Random() as the input shuffling parameter, then assert that the firstMySepecialObjMock happens to be the choice 50% of the times and secondMySepecialObjMock happens to be the choice in the other 50% of the times.

Something like:

@Test
public void myTestShouldCheckTheConfidenceInterval() {

    // using Mockito here
    MySepecialObj firstMySepecialObjMock = mock(MySepecialObj.class);
    MySepecialObj secondMySepecialObjMock = mock(MySepecialObj.class);

    // using some helpers from Guava to build the input list
    List<MySepecialObj> theListOfTwoElements = Lists.newArrayList(firstMySepecialObjMock, secondMySepecialObjMock);

    // call the method (multiple times? how many?) like:
    MySepecialObj chosenValue = pickTheValue(theListOfTwoElements, new Random());

    // assert somehow on all the choices using a confidence level
    // verifying that firstMySepecialObjMock was picked ~50% of the times
    // and secondMySepecialObjMock was picked the other ~50% of the times
}

I am not sure about the statistics theory here, so maybe I should provide a different instance of Random with different parameters to its constructor?

I would also like to have a test where I could set the confidence level as a parameter (I guess usually is 95%, but it could be another value?).

  • What could be a pure java solution/setup of the test involving a confidence level parameter?
  • What could be an equivalent solution/setup of the test involving some helper library like the Apache Commons?

Solution

    1. First of all this is the normal way to pick random elements from a List in Java. (nextInt(objs.size() produces random integers between 0 and objs.size()).

      public MySepecialObj pickTheValue(List<MySepecialObj> objs, Random random) {
          int i = random.nextInt(objs.size());
          return objs.get(i);
      }
      
    2. You can read in Wikipedia about how many times you should perform an experiment with 2 possible outcomes for a given confidence level. E.g. for confidence level of 95% you get a confidence interval of 1.9599. You also need to provide a maximum error say 0.01. Then the number of times to perform the experiment:

      double confidenceInterval = 1.9599;
      double maxError = 0.01;
      int numberOfPicks = (int) (Math.pow(confidenceInterval, 2)/(4*Math.pow(maxError, 2)));
      

      which results in numberOfPicks = 9603. That's how many times you should call pickTheValue.

    3. This would be how I recommend you perform the experiment multiple times (Note that random is being reused):

      Random random = new Random();
      double timesFirstWasPicked = 0;
      double timesSecondWasPicked = 0;
      for (int i = 0; i < numberOfPicks; ++i) {
          MySepecialObj chosenValue = pickTheValue(theListOfTwoElements, random);
          if (chosenValue == firstMySepecialObjMock) {
              ++timesFirstWasPicked;
          } else {
              ++timesSecondWasPicked;
          }
      }
      double probabilityFirst = timesFirstWasPicked / numberOfPicks;
      double probabilitySecond = timesSecondWasPicked / numberOfPicks;
      

      Then assert that probabilityFirst, probabilitySecond are no further than maxError from 0.5

    4. I found a BinomialTest class in apache-commons-math but I don't see how it can help in your case. It can calculate the confidence level from the number of experiments. You want the reverse of that.