I am trying to send a get request to DSpace 5.5 API to check if an item with a given handle is present in DSpace.
When I tested it in browser, it worked fine (return code 200, I've got the data about the searched item).
Then I began testing sending request with Python 3 requests module in Python console. Again, DSpace API returned correct response code (200) and json data in the response.
So, I implemented tested function into my script and suddenly DSpace API started to return error code 500. In the DSpace log I came accross this error message:
org.dspace.rest.RestIndex @ REST Login Success for user: jakub.rihak@ruk.cuni.cz
2017-01-03 15:38:34,326 ERROR org.dspace.rest.Resource @ Something get wrong. Aborting context in finally statement.
2017-01-03 15:38:34,474 ERROR org.dspace.rest.Resource @ Something get wrong. Aborting context in finally statement.
2017-01-03 15:38:34,598 ERROR org.dspace.rest.Resource @ Something get wrong. Aborting context in finally statement.
According to DSpace documentation, the request should by like this:
GET /handle/{handle-prefix}/{handle-suffix}
It is pointing to handle API endpoint on our DSpace server, so whole request should be sent to https://dspace.cuni.cz/rest/handle/123456789/937
(I think you can test it yourself).
In the browser I get following response:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<item>
<expand>metadata</expand
<expand>parentCollection</expand>
<expand>parentCollectionList</expand>
<expand>parentCommunityList</expand>
<expand>bitstreams</expand>
<expand>all</expand>
<handle>123456789/937</handle>
<id>1423</id>
<name>Komparace vývoje české a slovenské pravicové politiky od roku 1989 do současnosti</name>
<type>item</type>
<archived>true</archived>
<lastModified>2016-12-20 17:52:30.641</lastModified
<withdrawn>false</withdrawn>
</item>
When testing in Python console, my code looked like this:
from urllib.parse import urljoin
import requests
def document_in_dspace(handle):
url = 'https://dspace.cuni.cz/rest/handle/'
r_url = urljoin(url, handle)
print(r_url)
r = requests.get(r_url)
if r.status_code == requests.codes.ok:
print(r.text)
print(r.reason)
return True
else:
print(r.reason)
print(r.text)
return False
After calling this function in Python Console with document_in_dspace('123456789/937')
, response was this:
https://dspace.cuni.cz/rest/handle/123456789/937
{"id":1423,"name":"Komparace vývoje české a slovenské pravicové politiky od roku 1989 do současnosti","handle":"123456789/937","type":"item","link":"/rest/items/1423","expand":["metadata","parentCollection","parentCollectionList","parentCommunityList","bitstreams","all"],"lastModified":"2016-12-20 17:52:30.641","parentCollection":null,"parentCollectionList":null,"parentCommunityList":null,"bitstreams":null,"archived":"true","withdrawn":"false"}
OK
True
So I've decided to implement this function into my script (without any changes), but now DSpace API returns response code 500 when function is called.
Details on the implementation are bellow:
def get_workflow_process(document):
if document.document_in_dspace(handle=document.handle) is True:
return 'delete'
else:
return None
wf_process = get_workflow_process(document)
log.msg("Document:", document.doc_id, "Workflow process:", wf_process)
And the output is:
2017-01-04 11:08:45+0100 [-] DSPACE API response code: 500
2017-01-04 11:08:45+0100 [-] Internal Server Error
2017-01-04 11:08:45+0100 [-]
2017-01-04 11:08:45+0100 [-] False
2017-01-04 11:08:45+0100 [-] Document: 28243 Workflow process: None
Can you please provide me with any suggestions what might be causing it and how to solve this? I am quite surprised that this works in Python Console but not in actual script and it seems I can't figure out by myself. Thank you!
I think I figured it out. The problem was probably with some trailing newline characters in the handle
param of the document_in_dspace
function. Updated function looks like this:
def document_in_dspace(handle):
url = 'https://dspace.cuni.cz/rest/handle/' # TODO: Move to config
hdl = handle.rstrip()
prefix, suffix = str(hdl).split(sep='/')
r_url = url + prefix + '/' + suffix
log.msg("DSpace API request url is:", r_url)
r = requests.get(r_url, timeout=1)
if r.status_code == requests.codes.ok:
log.msg("DSPACE API response code:", r.status_code)
log.msg("Document with handle", handle, "found in DSpace!")
log.msg("Document handle:", handle)
log.msg("Request:\n", r.request.headers)
log.msg("\n")
log.msg(r.reason)
return True
else:
log.msg("DSPACE API response code:", r.status_code)
log.msg("Document with handle", handle, "not found in DSpace!")
log.msg("Document handle:", handle)
log.msg("Request:\n", r.request.headers)
log.msg("\n")
log.msg(r.reason)
return False
Basically, what I did was to call .rstrip()
on handle string to get rid of all unwanted trailing charactes, then I separated the prefix
and suffix
parts of the handle (just for the sake of being sure) and constructed request url (r_url
) by joining all the parts together.
I will make the function prettier in the future, but at least this now works as intended.
Output is following:
2017-01-04 15:06:16+0100 [-] Checking if document with handle 123456789/937
is in DSpace...
2017-01-04 15:06:16+0100 [-] DSpace API request url is: https://dspace.cuni.cz/rest/handle/123456789/937
2017-01-04 15:06:16+0100 [-] DSPACE API response code: 200
2017-01-04 15:06:16+0100 [-] Document with handle 123456789/937
found in DSpace!
2017-01-04 15:06:16+0100 [-] Document handle: 123456789/937
2017-01-04 15:06:16+0100 [-] Request:
{'Accept-Encoding': 'gzip, deflate', 'User-Agent': 'python-requests/2.11.1', 'Connection': 'keep-alive', 'Accept': '*/*'}
2017-01-04 15:06:16+0100 [-]
2017-01-04 15:06:16+0100 [-] OK
Nevertheless, DSpace API seems to return response code 500 when item with given handle is not present in the repository, instead of response code 404.