Search code examples
rsplitstrsplit

R: splitting a string between two characters using strsplit()


Let's say I have the following string:

s <- "ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0022705"

I would like to recover the strings between ";" and "=" to get the following output:

[1] "MIMAT0027618"  "MIMAT0027618"  "hsa-miR-6859-5p"  "MI0022705"

Can I use strsplit() with more than one split element?


Solution

  • 1) strsplit with matrix Try this:

    > matrix(strsplit(s, "[;=]")[[1]], 2)[2,]
    [1] "MIMAT0027618"    "MIMAT0027618"    "hsa-miR-6859-5p" "MI0022705"   
    

    2) strsplit with gsub or this use of strsplit with gsub:

    > strsplit(gsub("[^=;]+=", "", s), ";")[[1]]
    [1] "MIMAT0027618"    "MIMAT0027618"    "hsa-miR-6859-5p" "MI0022705"     
    

    3) strsplit with sub or this use of strsplit with sub:

    > sub(".*=", "", strsplit(s, ";")[[1]])
    [1] "MIMAT0027618"    "MIMAT0027618"    "hsa-miR-6859-5p" "MI0022705"   
    

    4) strapplyc or this which extracts consecutive non-semicolons after equal signs:

    > library(gsubfn)
    > strapplyc(s, "=([^;]+)", simplify = unlist)
    [1] "MIMAT0027618"    "MIMAT0027618"    "hsa-miR-6859-5p" "MI0022705"  
    

    ADDED additional strplit solutions.