Extract multiple chains from the CIF file (based on specified selections) and save them as one PDB

I have several CIF files, for each of them I'd like to extract 3 chains and save the three chains in one PDB file. Problem could be that some CIF structures can contain different chain-IDs of my desired chains: I solved it the way that I parsed RCSB webpage with Beautiful Soup and save the chain-IDs for each CIF file in corresponding yaml file. Then I open each CIF file as well as its yaml file, then I get correct chainIDs for each CIF from its yaml file.

Now I'd like to extract all three chains and save them. I tried to use something from similar questions this. However, the last line of my code (with io.save) returns error: IndexError: tuple index out of range

code:

cif_input = "AAA","BBB","CCC"
for ID in cif_input:
    cif_file = "{}.cif" .format(ID) #these are my CIF files
    
    yaml_file_protein = "{}.yaml" .format(ID) #these are yaml files for each of the CIF files, they contain information about chain IDs
    with open(yaml_file_protein, "r") as file_protein:
        proteins = yaml.load(file_protein, Loader=yaml.FullLoader) #loading the yaml file
        chain_1 = proteins["chain1"] #here I get information what ID has chain no.1, result in AAA.cif is for example: U
        chain_2 = proteins["chain2"] #here I get information what ID has chain no.2
        chain_3 = proteins["chain3"] #here I get information what ID has chain no.3
                
        structure = parser.get_structure("{}" .format(ID), "{}.cif" .format(ID))[0] #parsing corresponding structure (CIF file)
        
        #this is what I tried implement from similar questions: I need to extract all three chains and save them in one PDB, this have to be done for all CIF files
        class ChainSelect(Select):
            def accept_chain(self, chain):
                if chain.get_id()=='{}'.format(chain_1): #this is what I should select chainID of chain_1 based on what is specified above from the yaml file
                    return True
                if chain.get_id()=='{}'.format(chain_2):
                    return True
                if chain.get_id()=='{}'.format(chain_3):
                    return True
                else:
                    return False
        
        io = PDBIO()
        io.set_structure(structure)
        io.save("{}_selection.pdb" .format(ID), ChainSelect(chain))

I assume the problem will be somewhere in the "class ChainSelect..." but can't figure out what exactly could be causing such error. I'd be very helpful for any suggestions.

Edit: I also thought the problem could be in lines if chain.get_id()=='{}'.format(chain_X):, because it'd return e.g. U and not "U", thus I tried to edit it as if chain.get_id()==" '{}' ".format(chain_X): which return "U" , however the error is still the here. I tried this code with only one chain as well, but still the same error...

Solution

this code of mine works, reads '2ms2.pdb' , pdb that has Chains A,B,C

and saves "prova.pdb" with only Chains A,C. Without headers


from Bio.PDB import PDBParser, PDBIO, Select



parser=PDBParser(QUIET=True)


structure_1 = parser.get_structure('test', '2ms2.pdb')



for model in structure_1:
    
    print(model)
    
    for chain in model:
        
        print(chain, chain.get_id(), type(chain.get_id()))
        
        class ChainSelect(Select):
                    def accept_chain(self, chain):
                        if chain.get_id() == 'A':
                            return True
                    
                        if chain.get_id() == 'C':
                            return True
                        else:
                            return False
        
io = PDBIO()
io.set_structure(structure_1)
io.save("prova.pdb" , ChainSelect())

not sure where best place to put

class ChainSelect(Select):
            def accept_chain(self, chain):
                if chain.get_id() == 'A':
                    return True
            
                if chain.get_id() == 'C':
                    return True
                else:
                    return False

in this code though

Instruction here: https://biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ

Can I write PDB files?

Use the PDBIO class for this. It’s easy to write out specific parts of a structure too, of course.

Example: saving a structure

io = PDBIO()
io.set_structure(s)
io.save("out.pdb")

If you want to write out a part of the structure, make use of the Select class (also in PDBIO). Select has four methods:

accept_model(model)
accept_chain(chain)
accept_residue(residue)
accept_atom(atom)

By default, every method returns 1 (which means the model/chain/residue/atom is included in the output). By subclassing Select and returning 0 when appropriate you can exclude models, chains, etc. from the output. Cumbersome maybe, but very powerful. The following code only writes out glycine residues:

class GlySelect(Select):
    def accept_residue(self, residue):
        if residue.get_name() == "GLY":
            return 1
        else:
            return 0


io = PDBIO()
io.set_structure(s)
io.save("gly_only.pdb", GlySelect())

If this is all too complicated for you, the Dice module contains a handy extract function that writes out all residues in a chain between a start and end residue.