Search code examples
regexchemistrycheminformatics

How to use regexp to identify the number of hydrogens in a chemical formula?


Which expression should I use to identify the number of hydrogen atoms in a chemical formula?

For example:

C40H51N11O19 - 51 hydrogens

C2HO - 1 hydrogen

CO2 - no hydrogens (empty)

Any suggestions?

Thanks!

Cheers!


Solution

  • You can start using this regex :

    H\d*

    H -> match literaly the H caracter d* -> match 0 to N time a digit

    see exemple and try yourself other regex at : https://regex101.com/r/vdvH8S/2

    But regex wont convert for you the result, regex only do lookup.

    You need to process your result saying :

    • H with a number : extract the number
    • only H : 1
    • no match : 0