Search code examples
rvalidationmeanlme4statistical-sampling

Compare mean value per night with value for each night, statistic validity?


I am using data from sites to examine the impact of different factors on site exit over several nights. I used minutes since sunset to study exit of bats from sites.

I want to look at the 5 nights before a disturbance and the 5 nights after the disturbance. I remove the Night of disturbance (Night0) of my analysis.

My question is : can I take the mean of my variable (minutes since sunset) for the 5 days before (natural variability) and then compare it with the outputs on Night+1, +2, +3, +4, and +5? Is this statistically valid?

I hesitate to use the mean of minutes since sunset and affect it to "before" factor, or stack all rows of the 5 nights before and affect it to "before" factor.

I hope my question is clear.

Thanks a lot for response


Solution

  • I wouldn't take the mean of the pre-disturbance nights, no. I would pool the raw data under a 'pre-disturbance' factor and then compare them to the 'night 1', 'night 2,' etc. pooled data. If you have more than one site, you'll need to incorporate a random effect. Also, given that your response is minutes until an event occurs, you need to use a gamma distribution. Below is code for how you might accomplish this in R and lme4:

    library(lme4)
    
    my.data$pool <- relevel(my.data$pool, ref="pre-disturbance")
    #This is setting your model's reference level to the pre-disturbance pooled data. 
    #When you run summary() on the model object, it will compare your daily disturbance 
    #pools directly to the 'pre-disturbance' pool. 
    
    model <- glmer(num.minutes ~ data.pool + (1|site), family=Gamma(link='log'), data=my.data)
    #The model
    
    summary(model)