Search code examples
javachemistry

Parsing a chemical formula


I'm trying to write a method for an app that takes a chemical formula like "CH3COOH" and returns some sort of collection full of their symbols.

CH3COOH would return [C,H,H,H,C,O,O,H]

I already have something that is kinda working, but it's very complicated and uses a lot of code with a lot of nested if-else structures and loops.

Is there a way I can do this by using some kind of regular expression with String.split or maybe in some other brilliant simple code?


Solution

  • Assuming it's correctly capitalised, each symbol in the equation matches this regular expression:

    [A-Z][a-z]*\d*
    

    (For the chemically challenged, an element's symbol is always capital letter followed by optionally a lower case one or possibly two - e.g. Hg for mercury)

    You can capture the element symbol and the number in groups like so:

    ([A-Z][a-z]*)(\d*)
    

    So yes, in theory this would be something regular expressions could help with. If you're dealing with formulae like C6H2(NO2)3(CH3)3 then your job is of course a bit harder...