Search code examples
pythonbioinformaticsbiopythonprotein-database

TypeError when creating PDB file using Biopython's PDBIO, only with certain files


I am writing a script that renumbers protein structures (CIF files) and then saves them (PDB files: Biopython does not have a CIF saving function).

For most of the files I use, it works. But for files like 6ek0.pdb, 5t2c.pdb, and 4v6x.pdb I keep getting the same TypeError for the same line of the io.save function. The error also is there when I do not renumber the file, only have input and output like this:

from Bio import PDB

io = PDB.PDBIO()
pdb_parser = PDB.MMCIFParser()
pdbfile = '/Users/jbibbe/Documents/2018Masterstage_2/Scripts_part2/PDBfiles/5t2c.cif'
structure = pdb_parser.get_structure(' ', pdbfile)
io.set_structure(structure)
io.save(pdbfile[:-4] + '_test.pdb')

The error is:

Traceback (most recent call last):
  File "/Users/jbibbe/Documents/2018Masterstage_2/Scripts_part2/testerfile.py", line 8, in <module>
    io.save(pdbfile[:-4] + '_test.pdb')
  File "/Users/jbibbe/anaconda2/lib/python2.7/site-packages/Bio/PDB/PDBIO.py", line 222, in save
    resseq, icode, chain_id)
  File "/Users/jbibbe/anaconda2/lib/python2.7/site-packages/Bio/PDB/PDBIO.py", line 112, in _get_atom_line
    return _ATOM_FORMAT_STRING % args
TypeError: %c requires int or char

I looked at the code and the atom properties, but I could not see what was wrong with the type of the atom properties. Most of the parts in the atom_format_string are checked thoroughly by Biopython, so I would assume their types were right.

I hope you can help me. If I can do something to improve this question, please indicate (I am new here).

Edit: To be clear, what I want to do is

  1. understand what went wrong
  2. save the structure

Solution

  • The error is triggered when BioPython tries to write two-letter chain name using %c format in _ATOM_FORMAT_STRING.

    More generally, big structures like 5T2C (ribosome) cannot be written in the traditional PDB format. Many programs and libraries support two-character chain names (written in columns 21-22), but the standard is to have a single-character chain name in column 22. Then you need some extension of atom numbering to support more than 99,999 atoms - the most popular one is hybrid-36.

    Anyway, BioPython does not support big PDB files.

    (if you write what exactly you want to do someone may be able to suggest another solution)