R matrix values recycling?

I am following this YouTube tutorial on agent-based modelling in R https://www.youtube.com/watch?v=uAeSykgXnhg.

I have replicated the code using my own variable names (which helps me understand the tutor's code better). The aim is to track how people become infected with covid-19 when exposed to others. Not every contact results in an infection. Over successive model runs, the number of infected people = population size, the number of uninfected people should be 0. This is my replicated code:

# define first agent
agents <- data.frame(agent_no = 1,
                     state = "e",
                     mixing = runif(1,0,1))

# specify agent population
pop_size <- 100

# fill agent data
for(i in 2:pop_size){
  agent <- data.frame(agent_no = i,
                      state = "s",
                      mixing = runif(1,0,1))
  agents <- rbind(agents, agent)
}

# specify number of model runs
n_times <- 10

# initialise output matrix 
out <- matrix(0, ncol = 2, nrow = n_times)

# run simple agent-based model
for(k in 1:n_times){
  for(i in 1:pop_size){
    # likelihood to meet others
    likelihood <- agents$mixing[i]
    # how many agents will they meet (integer). Add 1 to make sure everybody meets somebody
    connect_with <- round(likelihood * 3, 0) + 1 
    # which agents will they probably meet (list of agents)
    which_others <- sample(1:pop_size, 
                           connect_with, 
                           replace = T, 
                           prob = agents$mixing)
    for(j in 1:length(which_others)){
      contacts <- agents[which_others[j],]
      # if exposed, change state
      if(contacts$state == "e"){
        urand <- runif(1,0,1)
        # control probability of state change
        if(urand < 0.5){
          agents$state[i] <- "e"
        }
      }
    }
  }
  out[k,] <- table(agents$state)
}

When looking at the output, once everybody becomes infected (first column), the number of uninfected people (second column) should be 0 but I get 100, which I suspect is due to recycling.

     [,1] [,2]
 [1,]   12   88
 [2,]   33   67
 [3,]   69   31
 [4,]   86   14
 [5,]   92    8
 [6,]   95    5
 [7,]   97    3
 [8,]   98    2
 [9,]   99    1
[10,]  100  100

I ran some diagnostics to see what is going on:

table(agents$state)
      e 
    100 

agents[agents$state == "s",]

    [1] agent_no state    mixing  
    <0 rows> (or 0-length row.names)

I think 0-length row.names is my issue. The result should be like this:

     [,1] [,2]
 [1,]   12   88
 [2,]   33   67
 [3,]   69   31
 [4,]   86   14
 [5,]   92    8
 [6,]   95    5
 [7,]   97    3
 [8,]   98    2
 [9,]   99    1
[10,]  100    0

Can somebody explain what I am doing wrong? Many thanks.

Solution

I increased n_times to 10000 and can find no evidence of recycling. While that doesn't mean it isn't happening, it means that unfortunately without a clear setup, we are unfortunately going to be unable to reproduce the problem. So my suggestions here are unproven.

Option 1

Given that you found one such scenario that ends with all agents$state == "e", then I'll suggest a trick that will always find at least one "s" (actually, one of each value that you know about):

  out[k,] <- table(c("e", "s", agents$state)) - 1

I'm assuming that the only possible values are "e" and "s"; if there are others, this technique relies completely on the premise that we ensure every possible value is seen at least once, and then decrement everything. Since we "add one observation" for each possible value, subtracting one from the table is safe. With this trick, your check should then be

table(agents$state)
#       e 
#     100 
table(c("e", "s", agents$state))
#       e       s 
#     101       1
table(c("e", "s", agents$state)) - 1
#       e       s 
#     100       0

And therefore recycling should not be a factor.

Option 2

Another technique which is more robust (i.e., does not need to include all possible values) is to force the length, assuming we know with certainty what it should be (which I think we do here):

z <- table(agents$state)
z
#   s 
# 100 
length(z) <- 2
z
#   s     
# 100  NA

Since you "know" that the length should always be 2, you can hard-code the 2 in there.

Option 3

This method is even a little more robust in that you don't need to know the absolute length, they will all be extended to the length of the longest return.

First, reproducible sample data:

set.seed(2021)
agents <- data.frame(agent_no = 1,
                     state = "e",
                     mixing = runif(1,0,1))
# specify agent population
pop_size <- 100
# fill agent data
for(i in 2:pop_size){
  agent <- data.frame(agent_no = i,
                      state = "s",
                      mixing = runif(1,0,1))
  agents <- rbind(agents, agent)
}
head(agents)
#   agent_no state    mixing
# 1        1     e 0.4512674
# 2        2     s 0.7837798
# 3        3     s 0.7096822
# 4        4     s 0.3817443
# 5        5     s 0.6363238
# 6        6     s 0.7013460

Replace your for loop:

for (k in 1:n_times) {
}

with

out <- lapply(seq_len(n_times), function(k) {
  for(i in 1:pop_size){
    # likelihood to meet others
    likelihood <- agents$mixing[i]
    # how many agents will they meet (integer). Add 1 to make sure everybody meets somebody
    connect_with <- round(likelihood * 3, 0) + 1 
    # which agents will they probably meet (list of agents)
    which_others <- sample(1:pop_size, 
                           connect_with, 
                           replace = T, 
                           prob = agents$mixing)
    for(j in 1:length(which_others)){
      contacts <- agents[which_others[j],]
      # if exposed, change state
      if(contacts$state == "e"){
        urand <- runif(1,0,1)
        # control probability of state change
        if(urand < 0.5){
          agents$state[i] <- "e"
        }
      }
    }
  }
  table(agents$state)
})

At this point, you have a list, likely of length-2 vectors:

out[1:3]
# [[1]]
#  e  s 
#  1 99 
# [[2]]
#  e  s 
#  2 98 
# [[3]]
#  e  s 
#  3 97

Note that we can determine the length of all of them with

lengths(out)
#  [1] 2 2 2 2 2 2 2 2 2 2

Similar to option 2 where we force the length of a vector, we can do the same here:

maxlen <- max(lengths(out))
out <- lapply(out, `length<-`, maxlen)
## or more verbosely
out <- lapply(out, function(vec) { length(vec) <- maxlen; vec; })

You can confirm that they are all the same length with table(lengths(out)), should be 2 by n_times of 10.

From here, we can combine all of these vectors into a matrix with

out <- do.call(rbind, out)
out
#        e  s
#  [1,]  1 99
#  [2,]  2 98
#  [3,]  3 97
#  [4,]  2 98
#  [5,]  1 99
#  [6,] 20 80
#  [7,] 12 88
#  [8,]  1 99
#  [9,]  2 98
# [10,]  1 99