I am parsing some files and I had planned to extract the information from somewhere within the file, but this failed due to special characters. The words I need are still contained in the filename but there is also other stuff in there.
I am assuming you could extract those with proper regular expression, but I am not able to do so. The origin is the word between the last and second last underscore. The destination is word between the .rds and the last underscore
name1<-"2020-06-15 11_41_40_Magdeburg_Bitterfeld-Wolfen.rds"
name2<-"2020-06-15 11_41_53_Niebüll_Sylt OT Westerland.rds"
name3<-"2020-06-15 11_41_57_Augsburg_Düsseldorf.rds"
I am parsing each file separtely and provided three examples. I would expect
name1_orgin<-"Magdeburg"
name1_dest<- "Bitterfeld-Wolfen"
name2_orgin<-"Niebüll"
name2_dest<- "Sylt OT Westerland"
name3_orgin<-"Augsburg"
name3_dest<- "Düsseldorf
You can use str_match
:
stringr::str_match(c(name1, name2, name3), '.*_(.*)_(.*)\\.rds')[, -1]
# [,1] [,2]
#[1,] "Magdeburg" "Bitterfeld-Wolfen"
#[2,] "Niebüll" "Sylt OT Westerland"
#[3,] "Augsburg" "Düsseldorf"