Search code examples
rprobabilitycontingency

Adding conditional probabilities to data.table in R


I have the titanic dataset in which I want to find the probability of survival based on 3 conditions. The following table gives the probabilities.

library(PASWR2)
tab = with(TITANIC3, ftable(fare = fare > 200, pclass, sex, survived)) %>% prop.table(1) %>% round(3) * 100
tab

Is there an easy way to add probabilities from tab table to TITANIC3 dataset as a new column?

Thanks!


Solution

  • This can be achieved by using the package data.table. The object TITANIC3 is of class data.frame. First you need to convert it to class data.table. When using data.table you can define new columns based on aggregations and a grouping clause directly in one line. Just run the code below.

    The new column with the conditional probability of survival is survival_prob. I always recommend using data.table because it is the fastest way to manipulate data in R. However, if you want to proceed your analysis with a data.frame, just use the command setDF(titanic3) to convert the object back to class data.frame.

    library(PASWR2)
    library(magrittr)
    library(data.table)
    
    # convert dataset from data frame to data table 
    titanic3 <- copy(TITANIC3)
    setDT(titanic3)
    
    # define new column survival_prob using by-option
    titanic3[, survival_prob := round(100*mean(survived), 1), 
             by = .(fare > 200, pclass, sex)]