Search code examples
rgroup-bysumcountifsummarize

Count characters and summarize values per group


I can seem to find a proper code for my problem. I want to create groups and summarize (sum, count or length) other columns based on different conditions.

I've tried group_by and summarize with different conditions but haven't found anything that works yet.

I have a table similar to this:

data <- data.frame(Name= c('Anna', 'Anna', 'Anna', 'Anna', 'Anna',
                       'Bella', 'Bella', 'Bella', 'Camilla', 'Camilla'),
               Date= c('1.1.2021', '1.1.2021', '2.1.2021', '3.1.2021', '3.1.2021', 
                       '1.1.2021', '5.1.2021', '5.1.2021', '7.1.2021', '8.1.2021'),
               Item= c('Apple','Pear', 'Zucini','Apple', 'Broccoli',
                       'Apple','Pear','Apple','Apple', 'Tomato'),
               Category= c('Fruit', 'Fruit', 'Vegetable', 'Fruit', 'Vegetable', 
                           'Fruit', 'Fruit', 'Fruit', 'Fruit', 'Vegetable'),
               Weight_kg= c(0.2,0.3,0.5,0.4,1.1,
                            1,0.5,0.8,1.2,0.5)
               )

This would be my desired output:

desired_table <- data.frame(Name=c('Anna', 'Bella', 'Camilla'),
Shopping_days=c(3,2,2),
days_fruit=c(2,2,1),
days_vegetables=c(2,0,1),
Total_kg=c(2.5,2.3,1.7),
Fruit_kg=c(0.9,2.3,1.2),
Vegetables=c(1.6,0,0.5))

Ive tried many variations of a code similar to this one that obviously doesn't work:

data1 <- data %>%
group_by(Name) %>%
summarize(Shopping_days = length(unique(Date)),
days_fruit = length(unique(Date, Category='Fruit')),
days_vegetables = length(unique(Date, Category='Vegetables')),
Total_kg = sum(Weight_kg),
Fruit_kg = sum(Weight_kg, if Category=Fruit),
Vegetables_kg = sum(Weight_kg, if Category=Vegetables))

Any help would be much appreciated.


Solution

  • Using group_by and summarise :

    library(dplyr)
    
    data %>%
      group_by(Name) %>%
      summarise(Shopping_days = n_distinct(Date), 
                days_fruit = n_distinct(Item[Category == 'Fruit']), 
                days_vegetables = n_distinct(Item[Category == 'Vegetable']), 
                Total_kg = sum(Weight_kg), 
                Fruit_kg = sum(Weight_kg[Category == 'Fruit']), 
                Vegetables_kg = sum(Weight_kg[Category == 'Vegetable']))
    
    #  Name    Shopping_days days_fruit days_vegetables Total_kg Fruit_kg Vegetables_kg
    #  <chr>           <int>      <int>           <int>    <dbl>    <dbl>         <dbl>
    #1 Anna                3          2               2      2.5      0.9           1.6
    #2 Bella               2          2               0      2.3      2.3           0  
    #3 Camilla             2          1               1      1.7      1.2           0.5