Search code examples
rcsvsplitmultiple-columnsrows

Split multiple columns into multiple new rows using condition R


My question seems similar to old ones, but I could not find an answer to my specific problem from those, so I will ask my question aswell:

I have a dataframe in R:

AT_ID <- c(1,2,3)
DEPARTURE_AIRPORT <- c("ZRH","ZRH","ZRH")
STOPOVER_1 <- c(NA, "BEL", "DUB")
STOPOVER_2 <- c(NA, "RUO", NA)
ARRIVAL_AIRPORT <- c("IAD", "LAX","BUD")
intinerary_id <- c(NA,NA,NA)

test_df <- data.frame(AT_ID, DEPARTURE_AIRPORT, STOPOVER_1, STOPOVER_2, ARRIVAL_AIRPORT, intinerary_id)

print (test_df)

This data frame should be split up so that the flights are separate for each segment. So for flight AT_ID 1 the flight segments would be itinerary_id = 1 from ZRH to IAD, AT_ID 2 would the segment 1 from ZRH to BEL, segment 2 from BEL to RUO and segment 3 from RUO to LAX, each segment in a different row but with the same intinerary_id.

The result would look something like that:

AT_ID <- c(1,2,3,4,5,6)

DEPARTURE_AIRPORT <- c("ZRH","ZRH","BEL","RUO","ZRH","DUB")
ARRIVAL_AIRPORT <- c("IAD", "BEL","RUO", "LAX","DUB","BUD")
intinerary_id <- c(1,2,2,2,3,3)

test_df_target <- data.frame(AT_ID, DEPARTURE_AIRPORT, ARRIVAL_AIRPORT, intinerary_id)

print(test_df_target)

The split()- function and separate_columns did not work out for me as the columns are not all the same, some have NA and some 3 segments.

I hope my question is clear, otherwise please let me know, so I can specify.

Thank you in advance!


Solution

  • We could use pivot_longer like this:

    library(dplyr)
    library(tidyr)
    
    test_df %>% 
      pivot_longer(
        cols =c(DEPARTURE_AIRPORT, STOPOVER_1, STOPOVER_2),
        names_to = "name",
        values_to = "DEPARTURE_AIRPORT"
      ) %>% 
      filter(!is.na(DEPARTURE_AIRPORT)) %>% 
      mutate(intinerary_id = AT_ID,
             AT_ID = row_number())%>% 
      select(AT_ID, DEPARTURE_AIRPORT, ARRIVAL_AIRPORT, intinerary_id)
    
      AT_ID DEPARTURE_AIRPORT ARRIVAL_AIRPORT intinerary_id
      <int> <chr>             <chr>                   <dbl>
    1     1 ZRH               IAD                         1
    2     2 ZRH               LAX                         2
    3     3 BEL               LAX                         2
    4     4 RUO               LAX                         2
    5     5 ZRH               BUD                         3
    6     6 DUB               BUD                         3