I am a complete statistical noob and new to R, hence the question. I've tried to find an implementation of the Rao score for the particular case when one's data
is binary and each observation has bernoulli distribution. I stumbled upon anova
in the R language but failed to understand how to use that. Therefore, I tried implementing Rao score for this particular case myself:
rao.score.bern <- function(data, p0) {
# assume `data` is a list of 0s and 1s
y <- sum(data)
n <- length(data)
phat <- y / n
z <- (phat - p0) / sqrt(p0 * (1 - p0) / n)
p.value <- 2 * (1 - pnorm(abs(z)))
}
I am pretty sure that there is a bug in my code because it produces only two distinct p-values in the following scenario:
p0 <- 1 / 4
p <- seq(from=0.01, to=0.5, by=0.01)
n <- seq(from=5, to=70, by=1)
g <- expand.grid(n, p)
data <- apply(g, 1, function(x) rbinom(x[1], 1, x[2]))
p.values <- sapply(data, function(x) rao.score.bern(x[[1]], p0))
Could someone please show me where the problem is? Could you perhaps point me to a built-in solution in R?
First test, then debug.
Does rao.score.bern
work at all?
rao.score.bern(c(0,0,0,1,1,1), 1/6))
This returns...nothing! Fix it by replacing the ultimate line by
2 * (1 - pnorm(abs(z)))
This eliminates the unnecessary assignment.
rao.score.bern(c(0,0,0,1,1,1), 1/6))
[1] 0.02845974
OK, now we're getting somewhere.
Unfortunately, the code still doesn't work. Let's debug by yanking the call to rao.score.bern
and replacing it by something that shows us the input. Don't apply it to the large input you created! Use a small piece of it:
sapply(data[1:5], function(x) x[[1]])
[1] 0 0 0 0 0
That's not what you expected, is it? It's returning just one zero for each element of data
. What about this?
sapply(data[1:5], function(x) x)
[[1]]
[1] 0 0 0 0 0
[[2]]
[1] 0 0 0 0 0 0
...
[[5]]
[1] 0 0 0 0 0 0 0 0 0
Much better! The variable x
in the call to sapply
refers to the entire vector, which is what you want to pass to your routine. Whence
p.values <- sapply(data, function(x) rao.score.bern(x, p0)); hist(p.values)