I am following these guidelines (although they are for python2) to perform a search here, and the query I need is:
queryText = """
<?xml version="1.0" encoding="UTF-8"?>
<orgPdbQuery>
<queryType>org.pdb.query.simple.TreeEntityQuery</queryType>
<description>TaxonomyTree Search for OTHER SEQUENCES</description>
<t>1</t>
<n>694009</n>
<nodeDesc>OTHER SEQUENCES</nodeDesc>
</orgPdbQuery>
"""
I know this query is right, because when I enter it into the second link 'Sample XML Queries' (selecting 'Source Organism Browser (NCBI)', i get an output (this is just the start of it):
383 results
1Q2W:1 1QZ8:1 1SSK:1 1UJ1:1 1UK2:1 1UK3:1 1UK4:1 1UW7:1 1WNC:1 1WOF:1 1WYY:1 1XAK:1 1YO4:1 1YSY:1 1Z1I:1 1Z1J:1 1ZV7:1 1ZV8:1 1ZV8:2 1ZVA:1 1ZVB:1 2A5A:1 2A5I:1 2A5K:1 2ACF:1 2AHM:1 2AHM:2 2AJF:2 2ALV:1 2AMD:1 2AMQ:1 2BEQ:1 2BEQ:2 2BEZ:1 2BEZ:2 2BX3:1 2BX4:1 2C3S:1 2CJR:1 2CME:1 2CME:2 2CME:3 2CME:4 2D2D:1 2DD8:3 2DUC:1 2FAV:1 2FE8:1 2FXP:1 2FYG:1 2G9T:1 2GA6:1 2GDT:1 2GHV:1 2GHW:1 2GIB:1 2GRI:1 2GT7:1 2GT8:1 2GTB:1 2GX4:1 2GZ7:1 2GZ8:1 2GZ9:1 2H2Z:1 2H85:1 2HOB:1 2HSX:1 2IDY:1 2JW8:1 2JZD:1 2JZE:1 2JZF:1 2K7X:1 2K87:1 2KAF:1 2KQV:1 2KQW:1 2KYS:1 2LIZ:1 2MM4:1 2OFZ:1 2OG3:1 2OP9:1 2OZK:1 2PWX:1 2Q6G:1 2QC2:1 2
I now want to replicate this search in python, so I wrote this:
import urllib
import urllib.parse
import urllib.request
url = 'http://www.rcsb.org/pdb/rest/search'
queryText = """
<?xml version="1.0" encoding="UTF-8"?>
<orgPdbQuery>
<queryType>org.pdb.query.simple.TreeEntityQuery</queryType>
<description>TaxonomyTree Search for OTHER SEQUENCES</description>
<t>1</t>
<n>694009</n>
<nodeDesc>OTHER SEQUENCES</nodeDesc>
</orgPdbQuery>
"""
encoded_data = urllib.parse.urlencode(queryText).encode('utf-8')
req = urllib.request.Request(url)
with urllib.request.urlopen(req,data=encoded_data) as f:
resp = f.read()
print(resp)
I get the error:
Traceback (most recent call last):
File "/Users/slowat/anaconda/envs/py3/lib/python3.6/urllib/parse.py", line 892, in urlencode
raise TypeError
TypeError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "generate_pdbs_from_rcsb.py", line 19, in <module>
encoded_data = urllib.parse.urlencode(queryText).encode('utf-8')
File "/Users/slowat/anaconda/envs/py3/lib/python3.6/urllib/parse.py", line 900, in urlencode
"or mapping object").with_traceback(tb)
File "/Users/slowat/anaconda/envs/py3/lib/python3.6/urllib/parse.py", line 892, in urlencode
raise TypeError
TypeError: not a valid non-string sequence or mapping obj
Could someone demonstrate how to get this code to work?
Update 1: I also tried:
url = 'http://www.rcsb.org/pdb/rest/search'
d = dict(queryType='org.pdb.query.simple.TreeEntityQuery',n='694009')
f = urllib.parse.urlencode(d)
f = f.encode('utf-8')
req = urllib.request.Request(url,f)
with urllib.request.urlopen(req) as f:
resp = f.read()
print(resp)
which has the output:
'Problem creating Query from XML: Content is not allowed in prolog.\nqueryType=org.pdb.query.simple.TreeEntityQuery&n=694009\n'
The urlencode
function expects a dictionary of key: value
pairs. There is no need to use this function here, since you're submitting XML directly to the service. The data
parameter should be bytes, so make sure to mark your queryText
as a byte sequence instead of a string (this is specific for Python 3 - the b
before """
marks it as a byt sequence and not as a plain string):
import urllib
import urllib.parse
import urllib.request
url = 'http://www.rcsb.org/pdb/rest/search'
queryText = b"""
<?xml version="1.0" encoding="UTF-8"?>
<orgPdbQuery>
<queryType>org.pdb.query.simple.TreeEntityQuery</queryType>
<description>TaxonomyTree Search for OTHER SEQUENCES</description>
<t>1</t>
<n>694009</n>
<nodeDesc>OTHER SEQUENCES</nodeDesc>
</orgPdbQuery>
"""
req = urllib.request.Request(url)
with urllib.request.urlopen(req,data=queryText) as f:
resp = f.read()
print(resp)
This gives the result you expect back in resp
.