Here is dummy input datasets:
numbers <- c(10, 50, 3, 60, 100,40, 2, 40, 10, 50)
id <- c("a", "a", "a", "a", "b", "b", "b", "b", "c", "c")
variation <- c("type1", "type2", "type2", "type3", "type1","type2", "type3", "type3", "type2", "type2" )
data <- data.frame(id, numbers, variation)
head(data)
# id numbers variation
# 1 a 10 type1
# 2 a 50 type2
# 3 a 3 type2
# 4 a 60 type3
# 5 b 100 type1
# 6 b 40 type2
# 7 b 2 type3
# 8 b 40 type3
# 9 c 10 type2
# 10 c 50 type2
My question is How to calculate percentage for each id depending on the "numbers" variable ?
Here is expected output: "percent" variable created down below.
# id numbers variation percent
# 1 a 10 type1 8.130081
# 2 a 50 type2 40.65041
# 3 a 3 type2 2.439024
# 4 a 60 type3 48.78049
# 5 b 100 type1 54.94505
# 6 b 40 type2 21.97802
# 7 b 2 type3 1.098901
# 8 b 40 type3 21.97802
# 9 c 10 type2 16.66667
# 10 c 50 type2 83.33333
R base and dplyr approach preferred. Thank you.
dplyr::mutate(data, percent = 100 * numbers / sum(numbers), .by = id)
Output:
id numbers variation percent
1 a 10 type1 8.130081
2 a 50 type2 40.650407
3 a 3 type2 2.439024
4 a 60 type3 48.780488
5 b 100 type1 54.945055
6 b 40 type2 21.978022
7 b 2 type3 1.098901
8 b 40 type3 21.978022
9 c 10 type2 16.666667
10 c 50 type2 83.333333