Search code examples
rstringcharacter

extracting character strings from a vector of strings using regrex


I have the vector files which contains character strings like this one: "./bNTI/bNTI_Yazoo_weighted.csv" and I want to extract the name between the two _ so in this example, "Yazoo". I tried:

names <- gsub(".*_ (.+) _*", "\\1", files)

But that returned the exact same as the original files vector. Please help!


Solution

  • Extract names between underscores

    names <- gsub(".*/bNTI_(.*)_weighted\\.csv", "\\1", files)
    
    # Print the extracted names
    print(names)
    
    • .* matches any character (except for a newline character) zero or more times.
    • /bNTI_ matches the literal "/bNTI_".
    • (.*) captures any character (except for a newline character) zero or more times (this is the part you want to extract).
    • _weighted\\.csv matches the literal " _weighted.csv".

    By using \\1 in the replacement part, you refer to the content captured by the parentheses in the pattern, which is the part between the underscores.