I am following this YouTube tutorial on agent-based modelling in R https://www.youtube.com/watch?v=uAeSykgXnhg.
I have replicated the code using my own variable names (which helps me understand the tutor's code better). The aim is to track how people become infected with covid-19 when exposed to others. Not every contact results in an infection. Over successive model runs, the number of infected people = population size, the number of uninfected people should be 0. This is my replicated code:
# define first agent
agents <- data.frame(agent_no = 1,
state = "e",
mixing = runif(1,0,1))
# specify agent population
pop_size <- 100
# fill agent data
for(i in 2:pop_size){
agent <- data.frame(agent_no = i,
state = "s",
mixing = runif(1,0,1))
agents <- rbind(agents, agent)
}
# specify number of model runs
n_times <- 10
# initialise output matrix
out <- matrix(0, ncol = 2, nrow = n_times)
# run simple agent-based model
for(k in 1:n_times){
for(i in 1:pop_size){
# likelihood to meet others
likelihood <- agents$mixing[i]
# how many agents will they meet (integer). Add 1 to make sure everybody meets somebody
connect_with <- round(likelihood * 3, 0) + 1
# which agents will they probably meet (list of agents)
which_others <- sample(1:pop_size,
connect_with,
replace = T,
prob = agents$mixing)
for(j in 1:length(which_others)){
contacts <- agents[which_others[j],]
# if exposed, change state
if(contacts$state == "e"){
urand <- runif(1,0,1)
# control probability of state change
if(urand < 0.5){
agents$state[i] <- "e"
}
}
}
}
out[k,] <- table(agents$state)
}
When looking at the output, once everybody becomes infected (first column), the number of uninfected people (second column) should be 0 but I get 100, which I suspect is due to recycling.
[,1] [,2]
[1,] 12 88
[2,] 33 67
[3,] 69 31
[4,] 86 14
[5,] 92 8
[6,] 95 5
[7,] 97 3
[8,] 98 2
[9,] 99 1
[10,] 100 100
I ran some diagnostics to see what is going on:
table(agents$state)
e
100
agents[agents$state == "s",]
[1] agent_no state mixing
<0 rows> (or 0-length row.names)
I think 0-length row.names is my issue. The result should be like this:
[,1] [,2]
[1,] 12 88
[2,] 33 67
[3,] 69 31
[4,] 86 14
[5,] 92 8
[6,] 95 5
[7,] 97 3
[8,] 98 2
[9,] 99 1
[10,] 100 0
Can somebody explain what I am doing wrong? Many thanks.
I increased n_times
to 10000 and can find no evidence of recycling. While that doesn't mean it isn't happening, it means that unfortunately without a clear setup, we are unfortunately going to be unable to reproduce the problem. So my suggestions here are unproven.
Given that you found one such scenario that ends with all agents$state == "e"
, then I'll suggest a trick that will always find at least one "s"
(actually, one of each value that you know about):
out[k,] <- table(c("e", "s", agents$state)) - 1
I'm assuming that the only possible values are "e"
and "s"
; if there are others, this technique relies completely on the premise that we ensure every possible value is seen at least once, and then decrement everything. Since we "add one observation" for each possible value, subtracting one from the table is safe. With this trick, your check should then be
table(agents$state)
# e
# 100
table(c("e", "s", agents$state))
# e s
# 101 1
table(c("e", "s", agents$state)) - 1
# e s
# 100 0
And therefore recycling should not be a factor.
Another technique which is more robust (i.e., does not need to include all possible values) is to force the length, assuming we know with certainty what it should be (which I think we do here):
z <- table(agents$state)
z
# s
# 100
length(z) <- 2
z
# s
# 100 NA
Since you "know" that the length should always be 2, you can hard-code the 2
in there.
This method is even a little more robust in that you don't need to know the absolute length, they will all be extended to the length of the longest return.
First, reproducible sample data:
set.seed(2021)
agents <- data.frame(agent_no = 1,
state = "e",
mixing = runif(1,0,1))
# specify agent population
pop_size <- 100
# fill agent data
for(i in 2:pop_size){
agent <- data.frame(agent_no = i,
state = "s",
mixing = runif(1,0,1))
agents <- rbind(agents, agent)
}
head(agents)
# agent_no state mixing
# 1 1 e 0.4512674
# 2 2 s 0.7837798
# 3 3 s 0.7096822
# 4 4 s 0.3817443
# 5 5 s 0.6363238
# 6 6 s 0.7013460
Replace your for
loop:
for (k in 1:n_times) {
}
with
out <- lapply(seq_len(n_times), function(k) {
for(i in 1:pop_size){
# likelihood to meet others
likelihood <- agents$mixing[i]
# how many agents will they meet (integer). Add 1 to make sure everybody meets somebody
connect_with <- round(likelihood * 3, 0) + 1
# which agents will they probably meet (list of agents)
which_others <- sample(1:pop_size,
connect_with,
replace = T,
prob = agents$mixing)
for(j in 1:length(which_others)){
contacts <- agents[which_others[j],]
# if exposed, change state
if(contacts$state == "e"){
urand <- runif(1,0,1)
# control probability of state change
if(urand < 0.5){
agents$state[i] <- "e"
}
}
}
}
table(agents$state)
})
At this point, you have a list, likely of length-2 vectors:
out[1:3]
# [[1]]
# e s
# 1 99
# [[2]]
# e s
# 2 98
# [[3]]
# e s
# 3 97
Note that we can determine the length of all of them with
lengths(out)
# [1] 2 2 2 2 2 2 2 2 2 2
Similar to option 2 where we force the length of a vector, we can do the same here:
maxlen <- max(lengths(out))
out <- lapply(out, `length<-`, maxlen)
## or more verbosely
out <- lapply(out, function(vec) { length(vec) <- maxlen; vec; })
You can confirm that they are all the same length with table(lengths(out))
, should be 2
by n_times
of 10.
From here, we can combine all of these vectors into a matrix with
out <- do.call(rbind, out)
out
# e s
# [1,] 1 99
# [2,] 2 98
# [3,] 3 97
# [4,] 2 98
# [5,] 1 99
# [6,] 20 80
# [7,] 12 88
# [8,] 1 99
# [9,] 2 98
# [10,] 1 99