Search code examples
pythonnumpybiopythonscientific-computingprotein-database

Biopython Array Addition Error (Open for all)


Okay. Let me explain the things first. I have used a specific module named Biopython in this code. I am explaining the necessary details to solve the problem if you are not accustomed with the module.

The code is:

#!/usr/bin/python

from Bio.PDB.PDBParser import PDBParser

import numpy as np

parser=PDBParser(PERMISSIVE=1)

structure_id="mode_7"
filename="mode_7.pdb"
structure=parser.get_structure(structure_id, filename)
model1=structure[0]
s=(124,3)
newc=np.zeros(s,dtype=np.float32)
coord=[]
#for chain1 in model1.get_list():
#   for residue1 in chain1.get_list():
#       ca1=residue1["CA"]
#       coord1=ca1.get_coord()
#       newc.append(coord1)
for i in range(0,29):
    model=structure[i]
    for chain in model.get_list():
        for residue in chain.get_list():
            ca=residue["CA"]
            coord.append(ca.get_coord())
    newc=np.add(newc,coord)

print newc

print "END"

PDB file is the protein data bank file. The file I'm working with can be downloaded from https://drive.google.com/open?id=0B8oUhqYoEX6YVFJBTGlNZGNBdlk

If you remove the hashes from the first for loop, you'll find that get_coord() returns a (124,3) array with dtype float32. Likewise, the next for loop is supposed to return the same.

It gives out a strange error:

Traceback (most recent call last):
  File "./average.py", line 27, in <module>
    newc=np.add(newc,coord)
ValueError: operands could not be broadcast together with shapes (124,3) (248,3)

I am absolutely clueless how it manages to make a 248,3 array. I just want to add the array coord over itself. I tried with another modification of the code:

#!/usr/bin/python

from Bio.PDB.PDBParser import PDBParser

import numpy as np

parser=PDBParser(PERMISSIVE=1)

structure_id="mode_7"
filename="mode_7.pdb"
structure=parser.get_structure(structure_id, filename)
model1=structure[0]
s=(124,3)
newc=np.zeros(s,dtype=np.float32)
coord=[]
newc2=[]
#for chain1 in model1.get_list():
#   for residue1 in chain1.get_list():
#       ca1=residue1["CA"]
#       coord1=ca1.get_coord()
#       newc.append(coord1)
for i in range(0,29):
    model=structure[i]
    for chain in model.get_list():
        for residue in chain.get_list():
            ca=residue["CA"]
            coord.append(ca.get_coord())
    newc2=np.add(newc,coord)

print newc

print "END"

It gives out the same error. Can you help???


Solution

  • I'm not sure I fully understand what you're doing, but it looks like you need to reset the coords list at the start of every iteration:

    for i in range(0,29):
        coords = []
        model=structure[i]
        for chain in model.get_list():
            for residue in chain.get_list():
                ca=residue["CA"]
                coord.append(ca.get_coord())
        newc=np.add(newc,coord)
    

    If you keep appending without clearing the list you add 124 items to coords at every iteration of the outer loop. The exception you see is likely raised during the second iteration.