Search code examples
rboxplotdetection

Create boxplot for detection probabilities in r


I have a dataset that looks similar to this:

sex observed date idtag
M 0 10/20/2019 12
M 0 10/20/2019 12
F 0 10/20/2019 21
F 0 10/20/2019 21
M 0 10/21/2019 12
M 1 10/21/2019 14
F 0 10/21/2019 21
M 1 10/21/2019 14
M 1 10/21/2019 14
F 1 10/21/2019 21
M 0 10/23/2019 12
M 0 10/23/2019 12
F 0 10/23/2019 21
F 0 10/23/2019 22
M 0 10/23/2019 14
M 1 10/23/2019 12
F 0 10/23/2019 22
M 1 10/23/2019 14
M 1 10/23/2019 12

I would like to create a boxplot of detection rate by sex. I.e., I would like to compare (total number of observations by sex/number of 1s per sex). I used this code to calculate detection rate by sex:

drrate_sex <- detectiondata %>% 
    group_by(sex) %>% 
   summarise(dr = mean(observed))

This is the standard boxplot code I typically use:

boxplot(? ~ sex, data=drdata, main="Detection by sex",
        xlab="Sex", ylab="Detection rate (%)", notch=T, par(mar=c(4,12,4,12)))

I am not sure how to incorporate detection rate (where I entered ? in the code) into the code to produce a boxplot in r that compares detection rate of females versus males. Any help would be appreciated.


Solution

  • The closest you could come to a boxplot for this type of data is probably a plot of the proportions with a confidence interval shown by an error bar.

    One way to get the numbers you need is to run a logistic regression, and use the coefficients from this to get a confidence interval for the proportions:

    # Logistic regression model
    regression <- glm(observed ~ sex, data = detectiondata, family = binomial)
    
    # Extract coefficients
    coefs <- summary(regression)$coef
    
    # Convert coefficients to odds
    coefs[2, 1] <- coefs[1, 1] + coefs[2, 1]
    odds <- exp(cbind(mean = coefs[,1], 
                      upper = coefs[,1] + 1.96 * coefs[,2], 
                      lower = coefs[,1] - 1.96 * coefs[,2]))
    
    # Convert odds to probabilities
    probs <- odds/(1 + odds)
    
    # Create data frame and add column for sex
    df <- as.data.frame(probs)
    df$sex <- c("Female", "Male")
    

    Which gives the following data frame:

    df
    #>                  mean     upper      lower    sex
    #> (Intercept) 0.1428571 0.5806110 0.01966987 Female
    #> sexM        0.5000000 0.9168654 0.08313460   Male
    

    Which we can plot like this:

    library(ggplot2)
    
    ggplot(df, aes(sex, mean)) +
      geom_errorbar(aes(ymin = lower, ymax = upper, color = sex), size = 3,
                    width = 0.3) +
      geom_point(size = 4, shape = 21, fill = "black")
    


    Data

    detectiondata <- structure(list(sex = c("M", "M", "F", "F", "M", "M", "F", "M", 
    "M", "F", "M", "M", "F", "F", "M", "M", "F", "M", "M"), observed = c(0L, 
    0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 
    1L, 1L), date = c("10/20/2019", "10/20/2019", "10/20/2019", "10/20/2019", 
    "10/21/2019", "10/21/2019", "10/21/2019", "10/21/2019", "10/21/2019", 
    "10/21/2019", "10/23/2019", "10/23/2019", "10/23/2019", "10/23/2019", 
    "10/23/2019", "10/23/2019", "10/23/2019", "10/23/2019", "10/23/2019"
    )), class = "data.frame", row.names = c(NA, -19L))
    

    Created on 2021-11-05 by the reprex package (v2.0.0)