Search code examples
rregexstringchemistry

Extract numbers from Chemical Formula in R


My data set (MSdata) looks something like this

m.z       Intensity Relative    Delta..ppm. RDB.equiv.  Composition 
301.14093   NA       100.00         -0.34   5.5         C16 H22 O4 Na
149.02331   4083458.5   23.60       -0.08   6.5         C8 H5 O3
279.15908   NA        18.64         -0.03   5.5         C16 H23 O4

and I would like it to look like

m.z       Intensity Relative    Delta..ppm. RDB.equiv.  C    H   O   Na
301.14093   NA       100.00         -0.34   5.5         16   22  4   1
149.02331   4083458.5   23.60       -0.08   6.5         8    5   3   0
279.15908   NA        18.64         -0.03   5.5         16   23  4   0

I have gotten as far as using this

library(stringr)
numextract <- function(string){
unlist(regmatches(string, gregexpr("[[:digit:]]+\\.*[[:digit:]]*"
                                  ,string)))
}
MScomp <- numextract("C14 H18 O4 Na")

However, this gives me

'14' '18' '4'

I need the 'Na' string to give me a value of 1 or 0 (or NA). I'm new to coding and a lot of this is beyond me- I have been using this website to help me. Additionally I have no idea how to merge these new columns (if this works..) into my current matrix. The website I linked previously uses a newcol() function? Thanks for any help you might have to offer!


Solution

  • I have edited the code as needed:

    library(tidyverse)
    library(stringr)  
    
    
    
    dat%>%mutate(Composition=gsub("\\b([A-Za-z]+)\\b","\\11",Composition),
                  name=str_extract_all(Composition,"[A-Za-z]+"),
                  value=str_extract_all(Composition,"\\d+"))%>%
       unnest()%>%spread(name,value,fill=0)
           m.z Intensity Relative Delta..ppm. RDB.equiv.    Composition  C  H Na O
    1 149.0233   4083459    23.60       -0.08        6.5       C8 H5 O3  8  5  0 3
    2 279.1591        NA    18.64       -0.03        5.5     C16 H23 O4 16 23  0 4
    3 301.1409        NA   100.00       -0.34        5.5 C16 H22 O4 Na1 16 22  1 4