Search code examples
pythongoogle-colaboratorybioinformaticsrdkitcheminformatics

rdkit ArgumentError: Python argument types in rdkit.Chem.rdMolDescriptors.GetAtomPairFingerprint(str) did not match C++ signature:


I'm currently working with peptide data and am trying to extract a atom-pair fingerprint from a peptide dataset, to be used in a machine learning classifier.

I've set my peptide sequences into a list (with all of them converted to SMILES strings), and am now iterating through the list to create a fingerprint for each peptide. But I don't have any clue what's going wrong. Note: I am using Google Colab to complete this.

Here is my code:

pos = "/content/drive/MyDrive/pepfun/Training_format_pos (1).txt"

# pos sequences extract into list
f = open(pos, 'r')
file_contents = f.read()
data = file_contents
f.close()

newdatapos = data.splitlines()
print(newdatapos)

!pip install rdkit-pypi
import rdkit
from rdkit import Chem

# fingerprints for pos sequences

from rdkit.Chem.AtomPairs import Pairs
fingerprintpos = []
for item in newdatapos:
  converteditem = rdkit.Chem.MolToSmiles(Chem.MolFromFASTA(item))
  atompos = Pairs.GetAtomPairFingerprint(converteditem)  
  fingerprintpos.append(atompos)

print(fingerprintpos)

Any advice is greatly appreciated. Thank you!


Solution

  • Fingerprints are calculated from mol objects not from SMILES. converteditem = Chem.MolFromFASTA(item) should work.