I have a dataframe as follows and I would like to concatenate the rows in the sequence (drop them if there is successive repetition) based on ticket number and identify how they are handed across people.
ticket<- c("1", "1", "1", "2", "2", "2", "2")
name<- c("Olg", "Jan", "Jan", "Olg", "Jan", "Jan","Olg")
df<- data.frame(ticket, name)
I want to create a column called variable called sequence which provides the paths and suppresses the successive repetitions as shown (Olg-Jan-Jan to Olg-Jan and Olg-Jan-Jan-Olg to Olg-Jan-Olg). Any suggestions? Thanks!
seq<- c("Olg-Jan", "Olg-Jan", ""Olg-Jan", "Olg-Jan-Olg","Olg-Jan-Olg","Olg-Jan-Olg" )
name
is a factor (and we could convert it to factor if it wasn't) so we use the underlying numeric factor codes to check for consecutive duplicates and remove them. We use dplyr
so that we can easily group by ticket
and chain functions together using the chaining operator (%>%
).
library(dplyr)
df %>% group_by(ticket) %>%
filter(c(1, diff(as.numeric(name))) !=0) %>%
summarise(sequence = paste(name, collapse="-"))
ticket sequence 1 1 Olg-Jan 2 2 Olg-Jan-Olg
If you want to keep all the rows of the original data frame and just add the sequence, then you can left_join
the output above to your original data frame:
df = df %>%
left_join(df %>% group_by(ticket) %>%
filter(c(1, diff(as.numeric(name))) !=0) %>%
summarise(sequence = paste(name, collapse="-")))
ticket name sequence 1 1 Olg Olg-Jan 2 1 Jan Olg-Jan 3 1 Jan Olg-Jan 4 2 Olg Olg-Jan-Olg 5 2 Jan Olg-Jan-Olg 6 2 Jan Olg-Jan-Olg 7 2 Olg Olg-Jan-Olg