Search code examples
rdataframedplyrcalculated-columns

Calculating columns in percent based on other columns for particular category in R


I'm a beginner with R and need help with the task. The output of dummy dataset with dput (real set is really big):

structure(list(CODE = c(453, 463, 476, 
798, 885, 582, 626, 663, 457, 408
), CATEGORY = c("CIG", "BET", "CIG", "CIG", "ARI", "CRR", "ARI", "CIG", 
"CIG", "BET"), AMOUNT = c(22, 5, 6, 52, 16, 11, 6, 70, 208, 5), 
    PRICE = c(5.56, 8.29, 3.89, 3.8, 4.05, 3.99, 3.55, 7.69, 6.75, 
    5.2), BRAND = c("ROTHMANS", "ALLINO", "MARLBORO", "ROTHMANS", "AURIELO", 
    "SOLINOS", "CHLEBLO", "MARLBORO", "LD", "SOLINOS"
    )), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

What should be done: a column which will show the share of every BRAND in percent should be added. First what I have done is create a new column "VALUE" this way:

df$VALUE <- with(df, AMOUNT*PRICE)

Now the column SHARE must be created in such a way: sum of values (from column VALUE) of a particular brand in a particular category divided by the sum of values of the whole category. For example, "ROTHMANS" is in category CIG, the sum of values for it is 319.92, the sum for the whole CIG category is 2285,56, so, the SHARE=14%. And it should be calculated in each case. I think dplyr library can be suitable, but can't find a solution.


Solution

  • You may first sum the BRAND value and get the proportion for each CATEGORY.

    library(dplyr)
    
    df %>%
      group_by(CATEGORY, BRAND) %>%
      summarise(VALUE = sum(VALUE)) %>%
      mutate(SHARE = prop.table(VALUE) * 100) %>%
      ungroup
    
    #  CATEGORY BRAND     VALUE SHARE
    #  <chr>    <chr>     <dbl> <dbl>
    #1 ARI      AURIELO    64.8  75.3
    #2 ARI      CHLEBLO    21.3  24.7
    #3 BET      ALLINO     41.4  61.5
    #4 BET      SOLINOS    26    38.5
    #5 CIG      LD       1404    61.4
    #6 CIG      MARLBORO  562.   24.6
    #7 CIG      ROTHMANS  320.   14.0
    #8 CRR      SOLINOS    43.9 100