I have several variables in my dataset that represent daily timing of events across a week.
For example for two rows might look like:
t1 = c(NA,12.6,10.7,11.5,12.5,9.5,14.1)
t2 = c(23.7,1.2,NA,22.9,23.2,0.5,0.1)
I want to calculate the variance of these rows. To do this, I need the mean and because these are periodic variables, I've adapted the code from this page:
#This can all be wrapped in a function like this
circ.mean <- function(m,int,na.rm=T) {
if(na.rm) m <- m[!is.na(m)]
rad.m = m*(360/int)*(pi/180)
mean.cos = mean(cos(rad.m))
mean.sin = mean(sin(rad.m))
x.deg = atan(mean.sin/mean.cos)*(180/pi)
return(x.deg/(360/int))
}
This works as expected for t2
:
> circ.mean(t2,24)
[1] -0.06803088
although ideally the answer would be 23.93197
. But for t1
, it gives an incorrect answer:
> circ.mean(t1,24)
[1] -0.1810074
whereas using the normal mean
function gives the right answer:
> mean(t1,na.rm=T)
[1] 11.81667
My questions are:
1) Is this "circular mean" code correct and if so, am I using it correctly?
2) I've had a stab my own circ.var
function (see below) to calculate the variance of a periodic variable - will this produce the correct variances for all possible input timing vectors?
circ.var <- function(m,int=NULL,na.rm=TRUE) {
if(is.null(int)) stop("Period parameter missing")
if(na.rm) m <- m[!is.na(m)]
if(sum(!is.na(m))==0) return(NA)
n=length(m)
mean.m = circ.mean(m,int)
var.m = 1/(n-1)*sum((((m-mean.m+(int/2))%%int)-(int/2))^2)
return(var.m)
}
Any help would be hugely appreciated! Thanks for taking the time to read this!
I deleted my old answer, as I believe there was a mistake in the solution I provided.
I've written a series of R scripts that I've made available at my GitHub page which should calculate the mean, variance and other stats.
Thanks to @Gregor for his help.