Why does n-of show a discontinuity when the size of the reported agentset goes from 2 to 3?

The n-of reporter is one of those reporters making random choices, so we know that if we use the same random-seed we will always get the same agentset out of n-of.

n-of takes two arguments: size and agentset (it can also take lists, but a note on this later). I would expect that it works by throwing a pseudo-random number, using this number to choose an agent from agentset, and repeating this process size times.

If this is true we would expect that, if we test n-of on the same agentset and using the same random-seed, but each time increasing size by 1, every resulting agentset will be the same as in the previous extraction plus a further agent. After all, the sequence of pseudo-random numbers used to pick the first (size - 1) agents was the same as before.

This seems to be confirmed generally. The code below highlights the same patches plus a further one everytime size is increased, as shown by the pictures:

to highlight-patches [n]
  clear-all
  random-seed 123
  
  resize-world -6 6 -6 6

  ask n-of n patches [
    set pcolor yellow
  ]
  
  ask patch 0 0 [
    set plabel word "n = " n
  ]
end

But there is an exception: the same does not happen when size goes from 2 to 3. As shown by the pictures below, n-of seems to follow the usual behaviour when starting from a size of 1, but the agentset suddenly changes when size reaches 3 (becoming the agentset of the figures above - which, as far as I can tell, does not change anymore):

What is going on there behind the scenes of n-of, that causes this change at this seemingly-unexplicable threshold?

In particular, this seems to be the case only for n-of. In fact, using a combination of repeat and one-of doesn't show this discontinuity (or at least as far as I've seen):

to highlight-patches-with-repeat [n]
  clear-all
  random-seed 123

  resize-world -6 6 -6 6
  
  repeat n [
    ask one-of patches [
      set pcolor yellow
    ]
  ]
  
  ask patch 0 0 [
    set plabel word "n = " n
  ]
end

Note that this comparison is not influenced by the fact that n-of guarantees the absence of repetitions while repeat + one-of may have repetitions (in my example above the first repetition happens when size reaches 13). The relevant aspect simply is that the reported agentset of size x is consistent with the reported agentset of size x + 1.

On using n-of on lists instead of agentsets

Doing the same on a list results in always different numbers being extracted, i.e. the additional extraction does not equal the previous extraction with the addition of a further number. While this looks to me as a counter-intuitive behaviour from the point of view of expecting always the same items to be extracted from a list if the extraction is based on always the same sequence of pseudo-random numbers, at least it looks to happen consistently and therefore it does not look to me as ambiguous behaviour as in the case of agentsets.

Solution

So let's find out how this works together. Let's start by checking the primitive implementation itself. It lives here. Here is the relevant bit with error handling and comments chopped out for brevity:

    if (obj instanceof LogoList) {
      LogoList list = (LogoList) obj;
      if (n == list.size()) {
        return list;
      }
      return list.randomSubset(n, context.job.random);
    } else if (obj instanceof AgentSet) {
      AgentSet agents = (AgentSet) obj;
      int count = agents.count();
      return agents.randomSubset(n, count, context.job.random);
    }

So we need to investigate the implementations of randomSubset() for lists and agentsets. I'll start with agentsets.

The implementation lives here. And the relevant bits:

    val array: Array[Agent] =
      resultSize match {
        case 0 =>
          Array()
        case 1 =>
          Array(randomOne(precomputedCount, rng.nextInt(precomputedCount)))
        case 2 =>
          val (smallRan, bigRan) = {
            val r1 = rng.nextInt(precomputedCount)
            val r2 = rng.nextInt(precomputedCount - 1)
            if (r2 >= r1) (r1, r2 + 1) else (r2, r1)
          }
          randomTwo(precomputedCount, smallRan, bigRan)
        case _ =>
          randomSubsetGeneral(resultSize, precomputedCount, rng)
      }

So there you go. We can see that there is a special case when the resultSize is 2. It auto-generates 2 random numbers, and flips them to make sure they won't "overflow" the possible choices. The comment on the randomTwo() implementation clarifies that this is done as an optimization. There is similarly a special case for 1, but that's just one-of.

Okay, so now let's check lists. Looks like it's implementation of randomSubset() lives over here. Here is the snippit:

  def randomSubset(n: Int, rng: Random): LogoList = {
    val builder = new VectorBuilder[AnyRef]
    var i = 0
    var j = 0
    while (j < n && i < size) {
      if (rng.nextInt(size - i) < n - j) {
        builder += this(i)
        j += 1
      }
      i += 1
    }
    LogoList.fromVector(builder.result)
  }

The code is a little obtuse, but for each element in the list it's randomly adding it to the resulting subset or not. If early items aren't added, the odds for later items go up (to 100% if need be). So changing the overall size of the list changes the numbers that will be generated in the sequence: rng.nextInt(size - i). That would explain why you don't see the same items selected in order when using the same seed but a larger list.

Elaboration

Okay, so let's elaborate on the n = 2 optimization for agentsets. There are a few things we have to know to explain this:

What does the non-optimized code do?

The non-optimized agentset code looks a lot like the list code I already discussed - it iterates each item in the agentset and randomly decides to add it to the result or not:

val iter = iterator
var i, j = 0
while (j < resultSize) {
  val next = iter.next()
  if (random.nextInt(precomputedCount - i) < resultSize - j) {
    result(j) = next
    j += 1
  }
  i += 1
}

Note that this code, for each item in the agentset will perform a couple of arithmetic operations, precomputedCount - i and resultSize - j as well as the final < comparison and the increments for j and i abd the j < resultSize check for the while loop. It also generates a random number for each checked element (an expensive operation) and calls next() to move our agent iterator forward. If it fills the result set before processing all elements of the agentset it will terminate "early" and save some of the work, but in the worst case scenario it is possible it'll perform all those operations for each element in the agentset when winds up needing the last agent to completely "fill" the results.

What does the optimized code do and why is it better??

So now let's check the optimized code n = 2 code:

if (!kind.mortal)
  Array(
    array(smallRandom),
    array(bigRandom))
else {
  val it = iterator
  var i = 0
  // skip to the first random place
  while(i < smallRandom) {
    it.next()
    i += 1
  }
  val first = it.next()
  i += 1
  while (i < bigRandom) {
    it.next()
    i += 1
  }
  val second = it.next()
  Array(first, second)
}

First, the check for kind.mortal at the start is basically checking if this is a patch agentset or not. Patches never die, so it's safe to assume all agents in the agentset are alive and you can just return the agents found in the backing array at the two provided random numbers as the result.

So on to the second bit. Here we have to use the iterator to get the agents from the set, because some of them might be dead (turtles or links). The iterator will skip over those for us as we call next() to get the next agent. You see the operations here are doing the while checks as it increments i up through the desired random numbers. So here the work is the increments for the indexer, i, as well as the checks for the while() loops. We also have to call next() to move the iterator forward. This works because we know smallRandom is smaller than bigRandom - we're just skipping through the agents and plucking out the ones we want.

Compared the non-optimized version we've avoided generator many of the random numbers, we avoid having an extra variable to track the result set count, and we avoid the math and less-than check to determine memebership in the result set. That's not bad (especially the RNG operations).

What would the impact be? Well if you have a large agentset, say 1000 agents, and you are picking 2 of them, the odds of picking any one agent are small (starting at 1/1000, in fact). That means you will run all that code for a long time before getting your 2 resulting agents.

So why not optimize for n-of 3, or 4, or 5, etc? Well, let's look back at the code to run the optimized version:

case 2 =>
  val (smallRan, bigRan) = {
    val r1 = rng.nextInt(precomputedCount)
    val r2 = rng.nextInt(precomputedCount - 1)
    if (r2 >= r1) (r1, r2 + 1) else (r2, r1)
  }
  randomTwo(precomputedCount, smallRan, bigRan)

That little logic at the end if (r2 >= r1) (r1, r2 + 1) else (r2, r1) makes sure that smallRan < bigRan; that is strictly less than, not equal. That logic gets much more complex when you need to generate 3, 4, or 5+ random numbers. None of them can be the same, and they all have to be in order. There are ways to quickly sort lists of numbers which might work, but generating random numbers without repetition is much harder.