Search code examples
pythonrdkit

How to highlight the substructure of a molecule with thick red lines in RDKit as SVG (high res)


I have the following code:

from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import rdDepictor
from rdkit.Chem.Draw import rdMolDraw2D
from IPython.display import SVG


m = Chem.MolFromSmiles('c1cc(C(=O)O)c(OC(=O)C)cc1')
substructure = Chem.MolFromSmarts('C(=O)O')
print(m.GetSubstructMatches(substructure))
m

Which produces the following plot.

enter image description here

However the code above doesn't produce the high resolution image. I'd like to have the SVG. I tried this:

drawer = rdMolDraw2D.MolDraw2DSVG(400,200)
drawer.DrawMolecule(m,highlightAtoms=m.GetSubstructMatch(Chem.MolFromSmarts('C(=O)O')))
drawer.FinishDrawing()
svg = drawer.GetDrawingText().replace('svg:','')
SVG(svg)

But I get:

enter image description here

What's the right way to do it?

The code can be tested in my Google Colab.


Solution

  • GetSubstructMatch returns only the first match. Use GetSubstructMatches. There are multiple scenarios here depending on the rdkit version you've installed. In the latest rdkit version (2021.09.2), the following code should work.

    from rdkit import Chem
    from rdkit.Chem.Draw import IPythonConsole
    from rdkit.Chem import rdDepictor
    from rdkit.Chem.Draw import rdMolDraw2D
    from IPython.display import SVG
    from copy import deepcopy
    
    
    def increase_resolution(mol, substructure, size=(400, 200)):
        mol = deepcopy(mol)
        substructure = deepcopy(substructure)
        drawer = rdMolDraw2D.MolDraw2DSVG(size[0], size[1])
        
        # highlightAtoms expects only one tuple, not tuple of tuples. So it needs to be merged into a single tuple
        matches = sum(mol.GetSubstructMatches(substructure), ())
        drawer.DrawMolecule(mol, highlightAtoms=matches)
        
        drawer.FinishDrawing()
        svg = drawer.GetDrawingText()
        
        return svg.replace('svg:','')
    
    
    mol = Chem.MolFromSmiles('c1cc(C(=O)O)c(OC(=O)C)cc1')
    substructure = Chem.MolFromSmarts('C(=O)O')
    SVG(increase_resolution(mol, substructure))
    

    If you're getting Value Error: Bad Conformer id error then either update the rdkit package to the latest version or try this:

    from rdkit import Chem
    from rdkit.Chem.Draw import IPythonConsole
    from rdkit.Chem import rdDepictor
    from rdkit.Chem.Draw import rdMolDraw2D
    from IPython.display import SVG
    from copy import deepcopy
    
    
    def increase_resolution(mol, substructure, size=(400, 200), kekulize=True):
        mol = deepcopy(mol)
        substructure = deepcopy(substructure)
        rdDepictor.Compute2DCoords(mol)
        if kekulize:
            Chem.Kekulize(mol) # Localize the benzene ring bonds
            
        drawer = rdMolDraw2D.MolDraw2DSVG(size[0], size[1])
        
        # highlightAtoms expects only one tuple, not tuple of tuples. So it needs to be merged into a single tuple
        matches = sum(mol.GetSubstructMatches(substructure), ())
        drawer.DrawMolecule(mol, highlightAtoms=matches)
        
        drawer.FinishDrawing()
        svg = drawer.GetDrawingText()
        return svg.replace('svg:','')
    
    
    mol = Chem.MolFromSmiles('c1cc(C(=O)O)c(OC(=O)C)cc1')
    substructure = Chem.MolFromSmarts('C(=O)O')
    SVG(increase_resolution(mol, substructure, kekulize=True))
    

    If for some cases like structures with chirality introduced in them as part of the SMILES string, it may fail to work. For such cases, set kekulize=False.