Search code examples
pythonregexchemistry

Isolate the first number after a letter with regular expressions


I am trying to parse a chemical formula that is given to me in unicode in the format C7H19N3

I wish to isolate the position of the first number after the letter, I.e 7 is at index 1 and 1 is at index 3. With is this i want to insert "sub" infront of the digits

My first couple attempts had me looping though trying to isolate the position of only the first numbers but to no avail.

I think that Regular expressions can accomplish this, though im quite lost in it.

My end goal is to output the formula Csub7Hsub19Nsub3 so that my text editor can properly format it.


Solution

  • How about this?

    >>> re.sub('(\d+)', 'sub\g<1>', "C7H19N3")
    'Csub7Hsub19Nsub3'
    

    (\d+) is a capturing group that matches 1 or more digits. \g<1> is a way of referring to the saved group in the substitute string.