Search code examples
rtidyverse

R How to calculate percentage depending on the 2 different categorical variables?


Here is dummy input datasets:

numbers <- c(10, 50, 3, 60, 100,40, 2, 40, 10, 50)
id <- c("a", "a", "a", "a", "b", "b", "b", "b", "c", "c")
variation <- c("type1", "type2", "type2", "type3", "type1","type2", "type3", "type3", "type2", "type2" )



data <- data.frame(id, numbers, variation)
head(data)

#    id numbers variation
# 1   a      10     type1
# 2   a      50     type2
# 3   a       3     type2
# 4   a      60     type3
# 5   b     100     type1
# 6   b      40     type2
# 7   b       2     type3
# 8   b      40     type3
# 9   c      10     type2
# 10  c      50     type2

My question is How to calculate percentage for each id depending on the "numbers" variable ?

Here is expected output: "percent" variable created down below.

#    id   numbers variation   percent
# 1   a        10     type1  8.130081   
# 2   a        50     type2  40.65041
# 3   a         3     type2  2.439024
# 4   a        60     type3  48.78049
# 5   b       100     type1  54.94505
# 6   b        40     type2  21.97802
# 7   b         2     type3  1.098901
# 8   b        40     type3  21.97802  
# 9   c        10     type2  16.66667
# 10  c        50     type2  83.33333

R base and dplyr approach preferred. Thank you.


Solution

  • dplyr::mutate(data, percent = 100 * numbers / sum(numbers), .by = id)
    

    Output:

       id numbers variation   percent
    1   a      10     type1  8.130081
    2   a      50     type2 40.650407
    3   a       3     type2  2.439024
    4   a      60     type3 48.780488
    5   b     100     type1 54.945055
    6   b      40     type2 21.978022
    7   b       2     type3  1.098901
    8   b      40     type3 21.978022
    9   c      10     type2 16.666667
    10  c      50     type2 83.333333