I'm a beginner with R and need help with the task. The output of dummy dataset with dput (real set is really big):
structure(list(CODE = c(453, 463, 476,
798, 885, 582, 626, 663, 457, 408
), CATEGORY = c("CIG", "BET", "CIG", "CIG", "ARI", "CRR", "ARI", "CIG",
"CIG", "BET"), AMOUNT = c(22, 5, 6, 52, 16, 11, 6, 70, 208, 5),
PRICE = c(5.56, 8.29, 3.89, 3.8, 4.05, 3.99, 3.55, 7.69, 6.75,
5.2), BRAND = c("ROTHMANS", "ALLINO", "MARLBORO", "ROTHMANS", "AURIELO",
"SOLINOS", "CHLEBLO", "MARLBORO", "LD", "SOLINOS"
)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
What should be done: a column which will show the share of every BRAND in percent should be added. First what I have done is create a new column "VALUE" this way:
df$VALUE <- with(df, AMOUNT*PRICE)
Now the column SHARE must be created in such a way: sum of values (from column VALUE) of a particular brand in a particular category divided by the sum of values of the whole category. For example, "ROTHMANS" is in category CIG, the sum of values for it is 319.92, the sum for the whole CIG category is 2285,56, so, the SHARE=14%. And it should be calculated in each case. I think dplyr library can be suitable, but can't find a solution.
You may first sum
the BRAND
value and get the proportion for each CATEGORY
.
library(dplyr)
df %>%
group_by(CATEGORY, BRAND) %>%
summarise(VALUE = sum(VALUE)) %>%
mutate(SHARE = prop.table(VALUE) * 100) %>%
ungroup
# CATEGORY BRAND VALUE SHARE
# <chr> <chr> <dbl> <dbl>
#1 ARI AURIELO 64.8 75.3
#2 ARI CHLEBLO 21.3 24.7
#3 BET ALLINO 41.4 61.5
#4 BET SOLINOS 26 38.5
#5 CIG LD 1404 61.4
#6 CIG MARLBORO 562. 24.6
#7 CIG ROTHMANS 320. 14.0
#8 CRR SOLINOS 43.9 100