I have several CIF files, for each of them I'd like to extract 3 chains and save the three chains in one PDB file. Problem could be that some CIF structures can contain different chain-IDs of my desired chains: I solved it the way that I parsed RCSB webpage with Beautiful Soup and save the chain-IDs for each CIF file in corresponding yaml file. Then I open each CIF file as well as its yaml file, then I get correct chainIDs for each CIF from its yaml file.
Now I'd like to extract all three chains and save them. I tried to use something from similar questions this. However, the last line of my code (with io.save) returns error: IndexError: tuple index out of range
code:
cif_input = "AAA","BBB","CCC"
for ID in cif_input:
cif_file = "{}.cif" .format(ID) #these are my CIF files
yaml_file_protein = "{}.yaml" .format(ID) #these are yaml files for each of the CIF files, they contain information about chain IDs
with open(yaml_file_protein, "r") as file_protein:
proteins = yaml.load(file_protein, Loader=yaml.FullLoader) #loading the yaml file
chain_1 = proteins["chain1"] #here I get information what ID has chain no.1, result in AAA.cif is for example: U
chain_2 = proteins["chain2"] #here I get information what ID has chain no.2
chain_3 = proteins["chain3"] #here I get information what ID has chain no.3
structure = parser.get_structure("{}" .format(ID), "{}.cif" .format(ID))[0] #parsing corresponding structure (CIF file)
#this is what I tried implement from similar questions: I need to extract all three chains and save them in one PDB, this have to be done for all CIF files
class ChainSelect(Select):
def accept_chain(self, chain):
if chain.get_id()=='{}'.format(chain_1): #this is what I should select chainID of chain_1 based on what is specified above from the yaml file
return True
if chain.get_id()=='{}'.format(chain_2):
return True
if chain.get_id()=='{}'.format(chain_3):
return True
else:
return False
io = PDBIO()
io.set_structure(structure)
io.save("{}_selection.pdb" .format(ID), ChainSelect(chain))
I assume the problem will be somewhere in the "class ChainSelect..." but can't figure out what exactly could be causing such error. I'd be very helpful for any suggestions.
Edit: I also thought the problem could be in lines if chain.get_id()=='{}'.format(chain_X):
, because it'd return e.g. U and not "U", thus I tried to edit it as if chain.get_id()==" '{}' ".format(chain_X):
which return "U" , however the error is still the here. I tried this code with only one chain as well, but still the same error...
this code of mine works, reads '2ms2.pdb'
, pdb that has Chains A,B,C
and saves "prova.pdb"
with only Chains A,C. Without headers
from Bio.PDB import PDBParser, PDBIO, Select
parser=PDBParser(QUIET=True)
structure_1 = parser.get_structure('test', '2ms2.pdb')
for model in structure_1:
print(model)
for chain in model:
print(chain, chain.get_id(), type(chain.get_id()))
class ChainSelect(Select):
def accept_chain(self, chain):
if chain.get_id() == 'A':
return True
if chain.get_id() == 'C':
return True
else:
return False
io = PDBIO()
io.set_structure(structure_1)
io.save("prova.pdb" , ChainSelect())
not sure where best place to put
class ChainSelect(Select):
def accept_chain(self, chain):
if chain.get_id() == 'A':
return True
if chain.get_id() == 'C':
return True
else:
return False
in this code though
Instruction here: https://biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ
Can I write PDB files?
Use the PDBIO class for this. It’s easy to write out specific parts of a structure too, of course.
Example: saving a structure
io = PDBIO()
io.set_structure(s)
io.save("out.pdb")
If you want to write out a part of the structure, make use of the Select class (also in PDBIO). Select has four methods:
accept_model(model)
accept_chain(chain)
accept_residue(residue)
accept_atom(atom)
By default, every method returns 1 (which means the model/chain/residue/atom is included in the output). By subclassing Select and returning 0 when appropriate you can exclude models, chains, etc. from the output. Cumbersome maybe, but very powerful. The following code only writes out glycine residues:
class GlySelect(Select):
def accept_residue(self, residue):
if residue.get_name() == "GLY":
return 1
else:
return 0
io = PDBIO()
io.set_structure(s)
io.save("gly_only.pdb", GlySelect())
If this is all too complicated for you, the Dice module contains a handy extract function that writes out all residues in a chain between a start and end residue.