Search code examples
rdataframeaggregater-faq

How to sum a variable by group


I have a data frame with two columns. First column contains categories such as "First", "Second", "Third", and the second column has numbers that represent the number of times I saw the specific groups from "Category".

For example:

Category     Frequency
First        10
First        15
First        5
Second       2
Third        14
Third        20
Second       3

I want to sort the data by Category and sum all the Frequencies:

Category     Frequency
First        30
Second       5
Third        34

How would I do this in R?


Solution

  • Using aggregate:

    aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
      Category  x
    1    First 30
    2   Second  5
    3    Third 34
    

    In the example above, multiple dimensions can be specified in the list. Multiple aggregated metrics of the same data type can be incorporated via cbind:

    aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...
    

    (embedding @thelatemail comment), aggregate has a formula interface too

    aggregate(Frequency ~ Category, x, sum)
    

    Or if you want to aggregate multiple columns, you could use the . notation (works for one column too)

    aggregate(. ~ Category, x, sum)
    

    or tapply:

    tapply(x$Frequency, x$Category, FUN=sum)
     First Second  Third 
        30      5     34 
    

    Using this data:

    x <- data.frame(Category=factor(c("First", "First", "First", "Second",
                                          "Third", "Third", "Second")), 
                        Frequency=c(10,15,5,2,14,20,3))