Search code examples
pythoncheminformatics

Converting molecule name to SMILES?


I was just wondering, is there any way to convert IUPAC or common molecular names to SMILES? I want to do this without having to manually convert every single one utilizing online systems. Any input would be much appreciated!

For background, I am currently working with python and RDkit, so I wasn't sure if RDkit could do this and I was just unaware. My current data is in the csv format.

Thank you!


Solution

  • RDKit cant convert names to SMILES. Chemical Identifier Resolver can convert names and other identifiers (like CAS No) and has an API so you can convert with a script.

    from urllib.request import urlopen
    from urllib.parse import quote
    
    def CIRconvert(ids):
        try:
            url = 'http://cactus.nci.nih.gov/chemical/structure/' + quote(ids) + '/smiles'
            ans = urlopen(url).read().decode('utf8')
            return ans
        except:
            return 'Did not work'
    
    identifiers  = ['3-Methylheptane', 'Aspirin', 'Diethylsulfate', 'Diethyl sulfate', '50-78-2', 'Adamant']
    
    for ids in identifiers :
        print(ids, CIRconvert(ids))
    

    Output

    3-Methylheptane CCCCC(C)CC
    Aspirin CC(=O)Oc1ccccc1C(O)=O
    Diethylsulfate CCO[S](=O)(=O)OCC
    Diethyl sulfate CCO[S](=O)(=O)OCC
    50-78-2 CC(=O)Oc1ccccc1C(O)=O
    Adamant Did not work