Search code examples
rgroup-bysummarize

Extracting unique column combination and finding sum and count in R


I have a flight database with 4 columns like shown below.

Original:

I want an output which gives rows based on unique combination of 3 col (origin/destination/Airline), sums the number of passengers for each unique combination and count the numbers of rows for each unique combination. The result would be something like this.

Output:

I am able to do 1 part of it using the group_by function

df %>% group_by(Origin, destination, carrier) %>% summarise(count = n())

How to include the sum of population?


Solution

  • We can use dplyr

    library(dplyr)
    df1 %>%
       group_by(Origin, Destination, Airline) %>%
       dplyr::summarise(count = n(), TotalPassengers = sum(Passengers))
    # Groups:   Origin, Destination [2]
    #  Origin Destination Airline count TotalPassengers
    #  <chr>  <chr>       <chr>   <int>           <dbl>
    #1 ABE    ATL         9A          2               3
    #2 ABE    ATL         DL          1               5
    #3 NYC    SFA         AA          3              21
    #4 NYC    SFA         DL          1               5
    

    data

    df1 <- data.frame(Origin = rep(c("ABE", "NYC"), c(3, 4)),
          Destination = rep(c("ATL", "SFA"), c(3, 4)),
          Airline = c("9A", "9A", "DL", "AA", "AA", "AA", "DL"),
          Passengers = c(2, 1, 5, 4, 10, 7, 5))