Search code examples
rsplitstrsplit

Regex to extract two specifc words from string


I am parsing some files and I had planned to extract the information from somewhere within the file, but this failed due to special characters. The words I need are still contained in the filename but there is also other stuff in there.

I am assuming you could extract those with proper regular expression, but I am not able to do so. The origin is the word between the last and second last underscore. The destination is word between the .rds and the last underscore

name1<-"2020-06-15 11_41_40_Magdeburg_Bitterfeld-Wolfen.rds"
name2<-"2020-06-15 11_41_53_Niebüll_Sylt OT Westerland.rds"
name3<-"2020-06-15 11_41_57_Augsburg_Düsseldorf.rds"

I am parsing each file separtely and provided three examples. I would expect

name1_orgin<-"Magdeburg"
name1_dest<- "Bitterfeld-Wolfen"
name2_orgin<-"Niebüll"
name2_dest<- "Sylt OT Westerland"
name3_orgin<-"Augsburg"
name3_dest<- "Düsseldorf

Solution

  • You can use str_match :

    stringr::str_match(c(name1, name2, name3), '.*_(.*)_(.*)\\.rds')[, -1]
    
    #     [,1]        [,2]                
    #[1,] "Magdeburg" "Bitterfeld-Wolfen" 
    #[2,] "Niebüll"   "Sylt OT Westerland"
    #[3,] "Augsburg"  "Düsseldorf"