I am building code to run and manage simulations of sampling events at sites that can be in one of three site cohorts. I use rep()
to assign the cohort identifier (1,2, or 3) using the code below:
cohort <- rep(1:n.cohorts, n.sites)
I have put the key line first, although to reproduce my problem you need to run the following lines, which allocate a total number of sites between cohorts for presentation to the rep()
call.
n.cohorts <- 3
s <- 10 # total available sites in this example
# different proportions of the total can be allocated to each cohort, for example
prop.control <- 0.4 ; prop.int <- 0.4 ; prop.ref <- 1-(prop.int+prop.control)
n.control <- prop.control * s; n.int <- prop.int * s; n.ref <- prop.ref * s
n.sites <- c(n.control, n.int, n.ref)
now, n.sites
by itself returns
[1] 4 4 2
so when I run my cohort <- rep(1:n.cohorts, n.sites)
call again I expect cohort
to be a list of 10 items, like this: [1] 1 1 1 1 2 2 2 2 3 3
. What i get, however, is only 9:
> cohort
[1] 1 1 1 1 2 2 2 2 3
If i run the same code where n.sites
is defined directly like so: n.sites <- c(4, 4, 2)
, I get the 10 items i expect. I have redone this several times to convince myself that under both scenarios n.sites
by itself produces identical results.
Can anyone explain why this happens? Many thanks in advance.
David
I think this is one of those arithmetical inaccuracy issues in R. The problem is here:
prop.ref <- 1-prop.int-prop.control
prop.ref*10
#[1] 2
floor(prop.ref*10)
#[1] 1
So r thinks that prop.int+prop.control
are very slightly bigger than 0.8
You can fix it by
cohort <- rep(1:n.cohorts, ceiling(n.sites))
But you're right, it does seem like a SERIOUS bug EDIT - sorry meant SEEM like a serious