Search code examples
rstringsplit

How to remove string before a certain delimiter in R?


I apologize in advance for the naivety of my question, I did not find a way to do this in R. I have several character strings that look like this :

sample.names1<-c("S0938-CR1","S0957-AB8","S0971-EGFP1-10")

I would like to remove the characters that appear before the first "-", in order to keep only CR1, AB8 and EGFP1-10

I tried

sample.names <- sapply(strsplit(basename(sample.names1), "-"), [, 2)

But this did not keep what came after the second "-". Thank you !


Solution

  • In (1) ^ matches the beginning of string, .* matches everything and ? specifies the shortest match. - matches itself.

    In (2) strcapture extracts everything after the first - producing a data.frame which we then reduce to a vector.

    In (3) we show a strsplit solution.

    In (4) we replace the first - with a / and then treating that as a file path extract the base name.

    In (5) use regexpr to find the position of the first - and then use substring with that position plus 1 to extract the desired portion.

    # 1
    sub("^.*?-", "", sample.names1)
    ## [1] "CR1"      "AB8"      "EGFP1-10"
    
    # 2
    strcapture("-(.*)", sample.names1, list(""))[[1]]
    ## [1] "CR1"      "AB8"      "EGFP1-10"
    
    # 3
    sapply(strsplit(sample.names1, "-"), \(x) paste(tail(x, -1), collapse = "-"))
    ## [1] "CR1"      "AB8"      "EGFP1-10"
    
    # 4
    basename(sub("-", "/", sample.names1))
    ## [1] "CR1"      "AB8"      "EGFP1-10"
    
    # 5
    substring(sample.names1, regexpr("-", sample.names1) + 1)
    ## [1] "CR1"      "AB8"      "EGFP1-10"
    

    Note

    The input as shown in the question:

    sample.names1 <- c("S0938-CR1", "S0957-AB8", "S0971-EGFP1-10")