Search code examples
rsummarizecolumnsortinggroup

R group columns of return trips data


I have data of train trips and the number of delayed or cancelled trains that I would like to make the sum.

Start     End        Delayed Cancelled
Paris     Rome       1       0
Brussels  Berlin     4       6
Berlin    Brussels   6       2
Rome      Paris      2       1

How can I group the Start and End columns to make the sum of Paris-Rome and Rome-Paris and Brussels-Berlin and Berlin Brussels to make the total of delayed and cancelled trips by train line ?


Solution

  • Order alphabetically and summarise by group:

    df |> 
      group_by(route = if_else(Start < End, paste(Start, End, sep = "-"), paste(End, Start, sep = "-"))) |> 
      summarise(Delayed = sum(Delayed), Cancelled = sum(Cancelled))
    
    #   route           Delayed Cancelled
    #   <chr>             <int>     <int>
    # 1 Berlin-Brussels      10         8
    # 2 Paris-Rome            3         1
    

    Reproducible data:

    df = data.frame(
      Start     = c("Paris", "Brussels", "Berlin", "Rome"),
      End       = c("Rome", "Berlin", "Brussels", "Paris"),
      Delayed   = c(1L, 4L, 6L, 2L),
      Cancelled = c(0L, 6L, 2L, 1L)
    )