Search code examples
rdplyrplyrconfidence-intervalhmisc

Unnesting a dataframe within a dataframe


I have been trying to calculate confidence intervals for binomial distributions through the Hmisc R package. Specifically, I used the binconf function which does its job perfectly.

library(plyr)
library(Hmisc)

Student <- c("A", "B", "C")
TP <- c(13, 36, 43)
obs.pos <- c(16, 37, 48)

df <- data.frame(Student, TP, obs.pos)

df1 <- df %>% 
  plyr::mutate(Sen = binconf(TP, obs.pos, alpha = 0.05, method = "wilson", return.df = TRUE))

df1 %>% View()

#  Student TP obs.pos Sen.PointEst Sen.Lower Sen.Upper
#1       A 13      16    0.8125000 0.5699112 0.9340840
#2       B 36      37    0.9729730 0.8617593 0.9986137
#3       C 43      48    0.8958333 0.7783258 0.9546783

Unfortunately, I feel that the function creates a data frame within my original data frame and that does not allow me to apply basic functions on my output anymore. For instance, I cannot select columns (by using dplyr) or round digits because R is not able to find the created columns (such as Sen.PointEst, Sen.Lower, Sen.Upper). Below, the structure of my output.

df1 %>% str()

#'data.frame':  3 obs. of  4 variables:
# $ Student: Factor w/ 3 levels "A","B","C": 1 2 3
# $ TP     : num  13 36 43
# $ obs.pos: num  16 37 48
# $ Sen    :'data.frame':   3 obs. of  3 variables:
#  ..$ PointEst: num  0.812 0.973 0.896
#  ..$ Lower   : num  0.57 0.862 0.778
#  ..$ Upper   : num  0.934 0.999 0.955

I would like to have all the columns at the first level of my output so that I can easily apply all the normal functions to my output.

Thanks for any help!


Solution

  • We have a column that is data.frame inside a data.frame. One option to flatten out the data.frame will be to call data.frame within do.call

    dfN <- do.call(data.frame, df1) 
    

    Or another option is to call the binconf within do

    df %>% 
      do(data.frame(., Sen = binconf(.$TP, .$obs.pos, alpha = 0.05, method = "wilson")))