I want to apply a math calculation which is (Occ_1+1)/(Totl_1+Unique_words)
, (Occ_2+1)/(Totl_2+Unique_words)
and (Occ_3+1)/(Totl_3+Unique_words)
and create a new column as Probability_1
, Probability_2
, Probability_3
Right now i am doing every calculation seperately and combining them together.
Ex: for (Occ_1+1)
i am doing sapply(df$Occ_1, function(x){x+1})
.
i have almost 50 Occ_
and 50 Totl_
so my code is getting very lengthy if i do all the calculations seperately.
Is there a way to do all the calculations at once.
sample DF till Occ_3
and Totl_3
only
word Occ_1 Occ_2 Occ_3 Totl_1 Totl_2 Totl_3 Unique_words
<chr> <int> <int> <int> <int> <int> <int> <int>
1 car 0 1 0 11 9 7 17
2 saturn 2 0 2 11 9 7 17
3 survival 1 2 0 11 9 7 17
4 baseball 1 1 0 11 9 7 17
5 color 0 0 1 11 9 7 17
6 muscle 0 1 0 11 9 7 17
I would just gather all the Occ..
, Tot..
columns together and perform the required arithmetic
occ_cols <- grep("^Occ", names(df))
tot_cols <- grep("^Totl", names(df))
df[paste0("Probability_", 1:length(occ_cols))] <-
(df[occ_cols] + 1)/(df[tot_cols] + df$Unique_words)
df
# word Occ_1 Occ_2 Occ_3 Totl_1 Totl_2 Totl_3 Unique_words Probability_1
#1 car 0 1 0 11 9 7 17 0.03571429
#2 saturn 2 0 2 11 9 7 17 0.10714286
#3 survival 1 2 0 11 9 7 17 0.07142857
#4 baseball 1 1 0 11 9 7 17 0.07142857
#5 color 0 0 1 11 9 7 17 0.03571429
#6 muscle 0 1 0 11 9 7 17 0.03571429
# Probability_2 Probability_3
#1 0.07692308 0.04166667
#2 0.03846154 0.12500000
#3 0.11538462 0.04166667
#4 0.07692308 0.04166667
#5 0.03846154 0.08333333
#6 0.07692308 0.04166667
However, make sure all your Occ..
and Tot..
columns are in the same order. For this example, we have Occ_1
, Occ_2
, Occ_3
followed by Totl_1
, Totl_2
and Totl_3
.