Search code examples
pythonchemistryrdkit

How do I retrieve the original atom indices after rdikit's Chem.RemoveAtom() reindexes them?


I am trying to make a function to form ester bonds between two molecules, and I am having a problem which is driving me absolutely nuts.

I have a function which identifies the middle carbon atom of a carboxyl site of a molecule, and it works correctly.

Then I use Chem.RemoveAtom() to remove that atom so that I can add the ester bond there later.

However, it reindexes the atoms in the molecule in a haphazard way.

I need to find a way to either a) prevent RDKit from reindexing my molecule until I can add the bond or b) find some way of mapping the old indices to the new indices.

Here is a working example of the phenomenon:

import rdkit.Chem as Chem
from rdkit.Chem import Draw

TMA = Chem.MolFromSmiles('OC(C1=CC(C(O)=O)=CC(C(O)=O)=C1)=O')
TPA = Chem.MolFromSmiles('O=C(O)C1=CC=C(C(O)=O)C=C1')

mol1 = TMA
carboxyl_idx1 = 1
editable_mol1 = Chem.RWMol(mol1)
editable_mol1.RemoveAtom(carboxyl_idx1)

mol2 = TMA
carboxyl_idx2 = 5
editable_mol2 = Chem.RWMol(mol2)
editable_mol2.RemoveAtom(carboxyl_idx2)

mol3 = TPA
carboxyl_idx3 = 7
editable_mol3 = Chem.RWMol(mol3)
editable_mol3.RemoveAtom(carboxyl_idx3)

mols = [mol1, editable_mol1, mol2, editable_mol2, mol3, editable_mol3]

for mol in mols:
    for atom in mol.GetAtoms():
        atom.SetProp('atomLabel', str(atom.GetIdx()))

img = Draw.MolsToGridImage(mols, molsPerRow=2, subImgSize=(300, 300))
print(carboxyl_idx1, carboxyl_idx2, carboxyl_idx3)
display(img)

It produces this: enter image description here

The new bond needs to be added to the site in the benzene ring where the carboxyl group was.

So in the first example, to carboxyl_idx, in the second, to carboxyl_idx-1, and in the third, to carboxyl_idx+1. It's always one of these three, but I need a reliable way of figuring out which index in editable_moli used to be the carboxyl_idx in moli before RemoveAtom reindexed everything.

Please help...


Solution

  • This is not an easy one and I think I figured it out for you. Long story short: using 'molAtomMapNumber' instead of 'atomLabel'

    from rdkit import Chem
    TMA = Chem.MolFromSmiles('OC(C1=CC(C(O)=O)=CC(C(O)=O)=C1)=O')
    TPA = Chem.MolFromSmiles('O=C(O)C1=CC=C(C(O)=O)C=C1')
    
    for i, atom in enumerate(TMA.GetAtoms()):
        # For each atom, set the property "molAtomMapNumber" to a custom number, let's say, the index of the atom in the molecule
        atom.SetProp("molAtomMapNumber", str(atom.GetIdx()+1))
    TMA
    

    Structure with Indices of TMA

    for i, atom in enumerate(TPA.GetAtoms()):
        # For each atom, set the property "molAtomMapNumber" to a custom number, let's say, the index of the atom in the molecule
        atom.SetProp("molAtomMapNumber", str(atom.GetIdx()+1))
    TPA
    

    Structure with Indices of TPA

    mol1 = TMA
    carboxyl_idx1 = 1
    editable_mol1 = Chem.RWMol(mol1)
    editable_mol1.RemoveAtom(carboxyl_idx1)
    editable_mol1
    
    mol2 = TMA
    carboxyl_idx2 = 5
    editable_mol2 = Chem.RWMol(mol2)
    editable_mol2.RemoveAtom(carboxyl_idx2)
    editable_mol2
    
    mol3 = TPA
    carboxyl_idx3 = 1
    editable_mol3 = Chem.RWMol(mol3)
    editable_mol3.RemoveAtom(carboxyl_idx3)
    editable_mol3
    
    mols=[mol1, editable_mol1, mol2, editable_mol2, mol3, editable_mol3]
    img = Draw.MolsToGridImage(mols, molsPerRow=2, subImgSize=(300, 300))
    img
    

    Final results

    Please let me know if there is any questions. Cheers!