Search code examples
pythonrdkit

Adding properties to SDF file in python3


I am new to rdkit. So excuse me if the question sounds very basic.I have a sdf file with several molecules. I would like to add certain properties to each entry. How can I achieve this? My sample data looks like this.

D00AAN
  -OEChem-10101305022D

100108  0     1  0  0  0  0  0999 V2000
    2.0000    5.1929    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    5.2896    2.9173    0.0000 S   0  0  0  0  0  0  0  0  0  0  0  0
    6.3905   -0.2731    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.8629   -5.1929    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1 53  1  0  0  0  0
  2  5  1  0  0  0  0
  2  6  2  0  0  0  0
M  END

$$$$

D00AAU
  -OEChem-10101305022D

 42 43  0     1  0  0  0  0  0999 V2000
    6.3301    3.2500    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.0000   -3.2500    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    4.5981    0.2500    0.0000 C   0  0  3  0  0  0  0  0  0  0  0  0
  1 15  1  0  0  0  0
  1 41  1  0  0  0  0
  2 16  1  0  0  0  0
  2 42  1  0  0  0  0
  3  4  1  0  0  0  0
  3  5  1  0  0  0  0
  3  8  1  0  0  0  0
M  END

$$$$

I would like to add a line after each molecule entry.

>  <ID>  id

The expected output is:

D00AAN
  -OEChem-10101305022D

100108  0     1  0  0  0  0  0999 V2000
    2.0000    5.1929    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    5.2896    2.9173    0.0000 S   0  0  0  0  0  0  0  0  0  0  0  0
    6.3905   -0.2731    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.8629   -5.1929    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1 53  1  0  0  0  0
  2  5  1  0  0  0  0
  2  6  2  0  0  0  0
M  END
>  <ID>  D00AAN
$$$$

D00AAU
  -OEChem-10101305022D

 42 43  0     1  0  0  0  0  0999 V2000
    6.3301    3.2500    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.0000   -3.2500    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    4.5981    0.2500    0.0000 C   0  0  3  0  0  0  0  0  0  0  0  0
  1 15  1  0  0  0  0
  1 41  1  0  0  0  0
  2 16  1  0  0  0  0
  2 42  1  0  0  0  0
  3  4  1  0  0  0  0
  3  5  1  0  0  0  0
  3  8  1  0  0  0  0
M  END
>  <ID>  D00AAU
$$$$


Solution

  • To get the title and turn it to a property,

    • read the .sdf with Chem.SDMolSupplier()

    • write or overwrite the .sdf with Chem.SDWriter('old.sdf | new.sdf')

    • get the title with GetProp('_Name')

    • set the title as a property SetProp('ID', 'title')

    This code should work:

    from rdkit import Chem
    
    suppl = Chem.SDMolSupplier('old.sdf')
    
    w = Chem.SDWriter('new.sdf')  # or old.sdf to overwrite
    
    for m in suppl:
        n = m.GetProp('_Name')    # title
        m.SetProp('ID', n)        # associated data
        w.write(m)
            
    w.close()