I am trying to parse a chemical formula that is given to me in unicode in the format C7H19N3
I wish to isolate the position of the first number after the letter, I.e 7
is at index 1 and 1
is at index 3. With is this i want to insert "sub" infront of the digits
My first couple attempts had me looping though trying to isolate the position of only the first numbers but to no avail.
I think that Regular expressions can accomplish this, though im quite lost in it.
My end goal is to output the formula Csub7Hsub19Nsub3
so that my text editor can properly format it.
How about this?
>>> re.sub('(\d+)', 'sub\g<1>', "C7H19N3")
'Csub7Hsub19Nsub3'
(\d+)
is a capturing group that matches 1 or more digits. \g<1>
is a way of referring to the saved group in the substitute string.