Search code examples
pythonrdkit

What is the difference between passing the statement Chem.MolFromSmiles directly or via a variable?


If I do not store the rdkit.Chem.rdchem.Mol object in a variable but pass the statement Chem.MolFromSmiles("<your-smile>") directly into another function it gives a different result than storing it in a variable before!

Why is that?

>>> from rdkit.Chem import Descriptors
>>> from rdkit import Chem



>>> # direct approach
>>> print(Descriptors.TPSA(Chem.MolFromSmiles('OC(=O)P(=O)(O)O')))
94.83
>>> print(Descriptors.TPSA(Chem.MolFromSmiles('OC(=O)P(=O)(O)O'), includeSandP=True))
104.64000000000001



>>> # mol as variable approach
>>> mol = Chem.MolFromSmiles('OC(=O)P(=O)(O)O')
>>> print(Descriptors.TPSA(mol))
94.83
>>> print(Descriptors.TPSA(mol, includeSandP=True))
94.83

In my mind the last printstatement should also give a result of ~104.64

This links you to the example that I am using: TPSA


Solution

  • Late to the party but @jasonharper is correct the TPSA value is cached in the molecule object:

    from rdkit.Chem import Descriptors
    from rdkit import Chem
    
    mol = Chem.MolFromSmiles('OC(=O)P(=O)(O)O')
    tpsa = Descriptors.TPSA(mol)
    
    # Show cached property
    mol.GetPropsAsDict(includeComputed=True, includePrivate=True)
    
    Out:
    
    {'__computedProps': <rdkit.rdBase._vectNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEE at 0x7fd38aa93740>,
     'numArom': 0,
     '_StereochemDone': 1,
     '_tpsaAtomContribs': <rdkit.rdBase._vectd at 0x7fd38aa932e0>,
     '_tpsa': 94.83}
    

    If you delete the two properties _tpsa and _tpsaAtomContribs you should get the result you are expecting:

    mol.ClearProp("_tpsa")
    mol.ClearProp("_tpsaAtomContribs")
    
    tpsa_sandp = Descriptors.TPSA(mol, includeSandP=True)
    assert tpsa != tpsa_sandp
    

    The best way though is to simply ignore the cache using force=True

    assert Descriptors.TPSA(mol, force=True) != tpsa_sandp