Search code examples
rdataframedatasetprobability

How to find probability of a dataframe in R?


I have a dataframe. Here's a small part of it:

     SubjectId Gender Groups  ExtraCalories        GW
1:         1      F     G3    -1310.00000    0.000000
2:         2      M     G6     -920.79656    4.331278
3:         3      M     G2      -25.39517    4.727376
4:         4      M     G5      169.25645    3.543941
5:         5      M     G5     -340.67235    4.591774
---                                               
996:     996      F     G1     464.82543     5.933792
997:     997      M     G8    -323.65136     5.024453
998:     998      F     G3      77.92138     5.383686
999:     999      M     G9    -237.83700     5.423941
1000:   1000      F     G9    -400.44831     6.837965

How do I find probability of a female choosing G5.


Solution

  • I guess you want to approximate probability by frequency. Last two options are more general than the base R solution

    Base R

    nrow(df[df$Gender == "F" & df$Groups == "G5",])/nrow(df[df$Gender == "F",])
    

    dplyr

    library(dplyr)
    df %>% filter(Gender == "F") %>%
       group_by(Groups) %>%
       summarise(n = n()) %>%
       ungroup() %>%
       mutate(p = n/n())   
    

    data.table

    library(data.table)
    setDT(df)
    df[Gender == "F"][,.(n = .N),by = Groups][,.(n/.N)]