Search code examples
rdplyrtidyversetidyrtidytext

Apply Math calculation to all rows of DF by Column Values


I want to apply a math calculation which is (Occ_1+1)/(Totl_1+Unique_words) , (Occ_2+1)/(Totl_2+Unique_words) and (Occ_3+1)/(Totl_3+Unique_words) and create a new column as Probability_1, Probability_2, Probability_3

Right now i am doing every calculation seperately and combining them together.

Ex: for (Occ_1+1) i am doing sapply(df$Occ_1, function(x){x+1}).

i have almost 50 Occ_ and 50 Totl_ so my code is getting very lengthy if i do all the calculations seperately.
Is there a way to do all the calculations at once.

sample DF till Occ_3 and Totl_3 only

 word        Occ_1  Occ_2  Occ_3  Totl_1 Totl_2 Totl_3 Unique_words
  <chr>      <int>  <int>  <int>  <int>  <int>  <int>        <int>
 1 car          0     1     0     11      9      7           17
 2 saturn       2     0     2     11      9      7           17
 3 survival     1     2     0     11      9      7           17
 4 baseball     1     1     0     11      9      7           17
 5 color        0     0     1     11      9      7           17
 6 muscle       0     1     0     11      9      7           17

Solution

  • I would just gather all the Occ.. , Tot.. columns together and perform the required arithmetic

    occ_cols <- grep("^Occ", names(df))
    tot_cols <- grep("^Totl", names(df))
    
    df[paste0("Probability_", 1:length(occ_cols))] <- 
          (df[occ_cols] + 1)/(df[tot_cols] + df$Unique_words)
    
    df
    #      word Occ_1 Occ_2 Occ_3 Totl_1 Totl_2 Totl_3 Unique_words Probability_1
    #1      car     0     1     0     11      9      7           17    0.03571429
    #2   saturn     2     0     2     11      9      7           17    0.10714286
    #3 survival     1     2     0     11      9      7           17    0.07142857
    #4 baseball     1     1     0     11      9      7           17    0.07142857
    #5    color     0     0     1     11      9      7           17    0.03571429
    #6   muscle     0     1     0     11      9      7           17    0.03571429
    
    #  Probability_2 Probability_3
    #1    0.07692308    0.04166667
    #2    0.03846154    0.12500000
    #3    0.11538462    0.04166667
    #4    0.07692308    0.04166667
    #5    0.03846154    0.08333333
    #6    0.07692308    0.04166667
    

    However, make sure all your Occ.. and Tot.. columns are in the same order. For this example, we have Occ_1, Occ_2, Occ_3 followed by Totl_1, Totl_2 and Totl_3.