Search code examples
r

Extract specific strings from character vector


That's a part of the vector:

c("tr|A0A075F5C6|A0A075F5C64_MORGAN", "sp|AC087WPF7|AUTS2_MORGAN", 
  "tr|A0A087WPU4|CCC087WPU4_MORGAN", "tr|DAA08W8RK1|A0A087WRK1_MORGAN", 
  "tr|A0A087WRT4|AFW0987WRT4_MORGAN", "tr|A0A087WSP5|A0A087WSP5_MORGAN"
)

The part which I am interested in is between | , for example first one is: A0A075F5C6. I was trying with regex but can not target it... Can you help me with a code which will allow me to extract only these strings from the character vector...


Solution

  • We can use strsplit() and sapply() here for a base R option:

    output <- sapply(x, function(x) strsplit(x, "\\|")[[1]][2])
    names(output) <- NULL
    output
    
    [1] "A0A075F5C6" "AC087WPF7"  "A0A087WPU4" "DAA08W8RK1" "A0A087WRT4"
    [6] "A0A087WSP5"
    

    Another option, using gsub():

    output <- gsub("^.*?\\||\\|.*$", "", x)
    output
    
    [1] "A0A075F5C6" "AC087WPF7"  "A0A087WPU4" "DAA08W8RK1" "A0A087WRT4"
    [6] "A0A087WSP5"