Search code examples
rtidyversepercentagesapply

calculate percentage of total for multiple columns


I have a data-frame in R with several columns that contribute to a totals column, as per below:

data <- data_frame(
Date = c("14/12/2018", "15/12/2018", "16/12/2018"),
Ent = c("C1", "C1", "C1"),
Ans = c(4, 9, 12),
Aban = c(1, 2, 1),
OOH = c(7, 5, 6),
Total = c(12, 16, 19),
)

Output below:

Date       Ent     Ans  Aban   OOH Total
<chr>      <chr> <dbl> <dbl> <dbl> <dbl>
14/12/2018 C1        4     1     7    12
15/12/2018 C1        9     2     5    16
16/12/2018 C1       12     1     6    19

What I am wanting to do is find the most efficient way that I can calculate the percentage contribution of each column to the total. Below I have my current solution which requires three separate lines of code:

#Ans
data$AnsP <- (data$Ans / data$Total) * 100

#Aban
data$AbanP <- (data$Aban / data$Total) * 100

#OOH
data$OOHP <- (data$OOH / data$Total) * 100

However, as I anticipate the source data-set to grow, this will eventually become sub-optimal for multiple variables

Is there an easy way I can calculate these percentage contributions in a single line of code, returning these percentages as columns in the existing dataframe? Perhaps with sapply or a function? I have made some crude attempts, but they have not worked

Desire Output as a dataframe:

Date       Ent     Ans  Aban   OOH Total  AnsP AbanP  OOHP
<chr>      <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
14/12/2018 C1        4     1     7    12  33.3  8.33  58.3
15/12/2018 C1        9     2     5    16  56.2 12.5   31.2
16/12/2018 C1       12     1     6    19  63.2  5.26  31.6

Any assistance would be appreciated on this

Regards, Tom


Solution

  • With dplyr

    library(dplyr)
    
    data %>%
       mutate_at(vars(Ans:OOH) , funs(P = ./data$Total * 100))
    
    
    #   Date       Ent     Ans  Aban   OOH Total Ans_P Aban_P OOH_P
    #  <chr>      <chr> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>
    #1 14/12/2018 C1        4     1     7    12  33.3   8.33  58.3
    #2 15/12/2018 C1        9     2     5    16  56.2  12.5   31.2
    #3 16/12/2018 C1       12     1     6    19  63.2   5.26  31.6
    

    Or if you prefer base R

    cols <- 3:5
    cbind(data, data[cols]/data$Total * 100)
    

    As Total column is same as sum of cols column we could also do

    data[cols]/rowSums(data[cols]) * 100