Search code examples
pythonchemistryrdkit

How can I determine the number of paraffinic CH3, CH2 and CH groups for any molecule with rdkit in Python?


I am trying to determine the number of paraffinic groups in any molecule using the rdkit package in Python. Initially I start to determine paraffinic CH3 groups, which I have to extend to paraffinic CH2, and paraffinic CH groups.

In the MWE, I am trying to determine this by a matching substructure, which does not work as intended. I tried searching for Fragments function for this also, but it is not available.

How can I determine the number of paraffinic CH3, CH2 and CH groups for any molecule with rdkit in Python?

MWE

from rdkit import Chem
from rdkit.Chem import Descriptors, Draw, Fragments

smiles_n_decane = 'CCCCCCCCCC'
smiles_branched = 'CCC(C)(C)C(C)CC(C)(C)C'
smiles_carboxylic_acid = 'C1=CC=C2C(=C1)C(C3=CC=CC=C3O2)C(=O)O' # Xanthene-9-carboxylic acid

m =  Chem.MolFromSmiles(smiles_branched)

print m.HasSubstructMatch(Chem.MolFromSmiles('[CH3]'))
print Fragments.fr_Al_COO(m)

Problem example

For the molecule (2,2,4,5,5-pentamethylheptane) given below:

enter image description here

the code should give me the following outputs:

  • no. of CH3 groups: 7
  • no. of CH2 groups: 2
  • no. of CH groups: 1

Solution

  • You should use SMARTS for substructure queries. Also, GetSubstructMatches() will return all the substructure matches rather than just return a boolean if the query matches like HasSubstructMatch():

    ch3 = Chem.MolFromSmarts('[CH3]')
    ch2 = Chem.MolFromSmarts('[CH2]')
    ch1 = Chem.MolFromSmarts('[CH]')
    
    print("no. of CH3 groups:", len(m.GetSubstructMatches(ch3)))
    print("no. of CH2 groups:", len(m.GetSubstructMatches(ch2)))
    print("no. of CH groups:", len(m.GetSubstructMatches(ch1)))
    
    [Out]:
    no. of CH3 groups: 7
    no. of CH2 groups: 2
    no. of CH groups: 1