Search code examples
c#data-structureschemistry

Preventing double hash operation when trying to update value in a of type Dictionary<IComparable, int>


I am working on software for scientific research that deals heavily with chemical formulas. I keep track of the contents of a chemical formula using an internal Dictionary<Isotope, int> where Isotope is an object like "Carbon-13", "Nitrogen-14", and the int represents the number of those isotopes in the chemical formula. So the formula C2H3NO would exist like this:

{"C12", 2
"H1", 3
"N14", 1
"O16", 1}  

This is all fine and dandy, but when I want to add two chemical formulas together, I end up having to calculate the hash function of Isotope twice to update a value, see follow code example.

public class ChemicalFormula {
    internal Dictionary<Isotope, int> _isotopes = new Dictionary<Isotope, int>();

    public void Add(Isotope isotope, int count)
    {
        if (count != 0)
        {
            int curValue = 0;
            if (_isotopes.TryGetValue(isotope, out curValue))
            {
                int newValue = curValue + count;
                if (newValue == 0)
                {
                    _isotopes.Remove(isotope);
                }
                else
                {
                    _isotopes[isotope] = newValue;
                }
            }
            else
            {
                _isotopes.Add(isotope, count);
            }
            _isDirty = true;
        }
    }
}

While this may not seem like it would be a slow down, it is when we are adding billions of chemical formulas together, this method is consistently the slowest part of the program (>45% of the running time). I am dealing with large chemical formulas like "H5921C3759N1023O1201S21" that are consistently being added to by smaller chemical formulas.

My question is, is there a better data structure for storing data like this? I have tried creating a simple IsotopeCount object that contains a int so I can access the value in a reference-type (as opposed to value-type) to avoid the double hash function. However, this didn't seem beneficial.

EDIT Isotope is immutable and shouldn't change during the lifetime of the program so I should be able to cache the hashcode.

I have linked to the source code so you can see the classes more in depth rather than me copy and paste them here.


Solution

  • I second the opinion that Isotope should be made immutable with precalculated hash. That would make everything much simpler.

    (in fact, functionally-oriented programming is better suited for calculations of such sort, and it deals with immutable objects)