Search code examples
rstatisticsconfidence-interval

How to parse pairwiseCI test over multiple rows in a data frame of results


I'm trying to calculate confidence intervals for the difference in some results using pairwiseCI.

The dataframe looks like this:

Category Male_Success Female_Success Male_UnSuccessful Female_UnSuccessful
A 100 150 90 60
B 70 40 30 80
C 20 30 50 50

To calculate the confidence interval for the difference in proportion successful for Category A I would apply the following code:

library(pairwiseCI)

success <- c(100, 150)
failure <- c(90, 60)
page <- c(2,1)
dataframe <- data.frame(cbind(success,failure,page))
pairwiseCI(cbind(success,failure)~page, data=dataframe, method="Prop.diff", CImethod="CC")

which gives the following output:

95 %-confidence intervals 
Method:  Continuity corrected interval for the difference of proportions 
  
estimate   lower   upper
2-1   -0.188 -0.2867 -0.0893

I would like to produce this for all 3 categories without typing them individually (I've used the 'apply' function before for chi-sq tests over a dataframe but cannot figure out how to use it in this setting). Ideally, I would like the estimate, lower and upper results printed in columns next to the original dataframe so it looks like this:

Category Male_Success Female_Success Male_UnSuccessful Female_UnSuccessful estimate lower upper

Thank you very much for your help in advance!


Solution

  • You can create a helper function and apply the function to each row. In my example, I use the stats::prop.test() function instead of using a speciality package (pairwiseCI)

    1. Helper function that takes the four values of success/failure and returns a list of the estimate, and the confidence interval
    f <- function(s1,s2,f1,f2) {
      k <- prop.test(matrix(c(s1,s2,f1,f2),nrow=2,ncol=2))
      setNames(as.list(c(-1*diff(k$estimate),k$conf.int)),c("estimate", "lower","upper"))
    }
    
    1. Apply the function to each row
    library(data.table)
    setDT(df)[, (c("estimate", "lower", "upper")):= f(Male_Success, Female_Success, Male_UnSuccessful, Female_UnSuccessful), Category]
    

    Note: above I use data.table, but you could also use dplyr and tidyr, like this:

    library(dplyr)
    library(tidyr)
    
    df %>% 
      group_by(Category) %>%
      mutate(r = list(f(Male_Success,Female_Success, Male_UnSuccessful, Female_UnSuccessful))) %>% 
      ungroup() %>% 
      unnest_wider(r)
    

    Output:

       Category Male_Success Female_Success Male_UnSuccessful Female_UnSuccessful    estimate      lower
         <char>        <int>          <int>             <int>               <int>       <num>      <num>
    1:        A          100            150                90                  60 -0.18796992 -0.2866507
    2:        B           70             40                30                  80  0.36666667  0.2342893
    3:        C           20             30                50                  50 -0.08928571 -0.2525247
             upper
             <num>
    1: -0.08928912
    2:  0.49904403
    3:  0.07395327
    

    Input:

    df = structure(list(Category = c("A", "B", "C"), Male_Success = c(100L, 
    70L, 20L), Female_Success = c(150L, 40L, 30L), Male_UnSuccessful = c(90L, 
    30L, 50L), Female_UnSuccessful = c(60L, 80L, 50L)), row.names = c(NA, 
    -3L), class = "data.frame")