Search code examples
rdataframemergeaggregateaggregate-functions

Aggregate dataframe by condition in R


I have the following DataFrame in R:

 Y       ...    Price      Year           Quantity      Country    
010190   ...   4781       2021               4           Germany    
010190   ...   367        2021               3           Germany 
010190   ...   4781       2021               6           France    
010190   ...   250        2021               3           France    
020190   ...   690        2021               NA          USA        
020190   ...   10         2021               6           USA  
......         ...         ....              ..          ...   
217834  ...    56        2021                3           USA        
217834 ...     567       2021                9           USA        

As you see the numbers in Y column startin with 01.., 02..., 21... I want to aggregate such kind of rows from 6 digit to 2 digit by considering different categorical column (e.g. Country and Year) and sum numerical columns like Quantity and Price. Also I want to take into account rows with NAs during caclulation. So, in the end I want such kind of output:

 Y     Price      Year          Quantity   Country
01     5148       2021           7         Germany
01     5031       2021           9          USA
02     700        2021           6          USA
..     ....       ...           ....        ...      
21     623        2021           12         USA

Solution

  • update: request:

    library(dplyr)
    df %>% 
      mutate(Y = substr(Y, 1, 2)) %>% 
      group_by(Y, Year, Country) %>% 
      summarise(across(c(Price, Quantity), ~sum(., na.rm = TRUE)))
    

    We could use substr to get the first two characters from Y and group_by and summarise() with sum()

    library(dplyr)
    df %>% 
      mutate(Y = substr(Y, 1, 2)) %>% 
      group_by(Y, Year, Country) %>% 
      summarise(Price = sum(Price, na.rm = TRUE),
                Quantity = sum(Quantity, na.rm = TRUE)
                )
    
      Y      Year Country Price Quantity
      <chr> <dbl> <chr>   <dbl>    <dbl>
    1 01     2021 France   5031        9
    2 01     2021 Germany  5148        7
    3 02     2021 USA       700        6
    4 21     2021 USA       623       12