Search code examples
pythonweb-crawlerscopus

Getting author's articles from Scopus using Scopus API (AUTHENTICATION_ERROR)


I've registered at http://www.developers.elsevier.com/action/devprojects. I created a project and got my scopus key:

enter image description here

Now, using this generated key, I would like to find an author by firstname, lastname and subjectarea. I make requests from my university network, which is allowed to visit Scopus (I have full manual access to Scopus search, use it from Firefox with no problem). However, I wanted to automatize my Scopus mining, by writing a simple script. I would like to find publications of an author by giving his/her firstname, lastname and subjectarea.

Here's my code:

# !/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
import json
from scopus import SCOPUS_API_KEY


scopus_author_search_url = 'http://api.elsevier.com/content/search/author?'
headers = {'Accept':'application/json', 'X-ELS-APIKey': SCOPUS_API_KEY}
search_query = 'query=AUTHFIRST(%) AND AUTHLASTNAME(%s) AND SUBJAREA(%s)' % ('John', 'Kitchin', 'COMP')

# api_resource = "http://api.elsevier.com/content/search/author?apiKey=%s&" % (SCOPUS_API_KEY)

# request with first searching page
page_request = requests.get(scopus_author_search_url + search_query, headers=headers)
print page_request.url

# response to json
page = json.loads(page_request.content.decode("utf-8"))
print page

Where SCOPUS_API_KEY looks just like this: SCOPUS_API_KEY="xxxxxxxx".

Although I have full access to scopus from my university network, I'm getting such response:

{u'service-error': {u'status': {u'statusText': u'Requestor configuration settings insufficient for access to this resource.', u'statusCode': u'AUTHENTICATION_ERROR'}}}

The generated link looks like this: http://api.elsevier.com/content/search/author?query=AUTHFIRST(John)%20AND%20AUTHLASTNAME(Kitchin)%20AND%20SUBJAREA(COMP) and when I click it, it shows an XML file:

<service-error><status>
  <statusCode>AUTHORIZATION_ERROR</statusCode>
  <statusText>No APIKey provided for request</statusText>
</status></service-error>

Or, when I change the scopus_author_search_url to "http://api.elsevier.com/content/search/author?apiKey=%s&" % (SCOPUS_API_KEY) I'm getting:

{u'service-error': {u'status': {u'statusText': u'Requestor configuration settings insufficient for access to this resource.', u'statusCode': u'AUTHENTICATION_ERROR'}}} and the XML file:

<service-error>
<status>
<statusCode>AUTHENTICATION_ERROR</statusCode>
<statusText>Requestor configuration settings insufficient for access to this resource.</statusText>
</status>
</service-error>

What can be the cause of this problem and how can I fix it?


Solution

  • I have just registered for an API key and tested it first with this URL:

    http://api.elsevier.com/content/search/author?apikey=4xxxxxxxxxxxxxxxxxxxxxxxxxxxxx43&query=AUTHFIRST%28John%29+AND+AUTHLASTNAME%28Kitchin%29+AND+SUBJAREA%28COMP%29

    This works fine from my university network. I also tested a second API Key, so have verified one with registered website on my university domain, one with registered website http://apitest.example.com, ruling out the domain name used to register as the source of your problem.

    I tested this

    1. in the browser,
    2. using your python code both with the api key in the headers. The only change I made to your code is removing

      from scopus import SCOPUS_API_KEY
      

      and adding

      SCOPUS_API_KEY ='4xxxxxxxxxxxxxxxxxxxxxxxxxxxxx43'
      
    3. using your python code adapted to put the apikey in the URL instead of the headers.

    In all cases, the query returns two authors, one at Carnegie Mellon and one at Palo Alto.

    I can't replicate your error message. If I try to use the API key from an IP address unregistered with elsevier (e.g. my home computer), I see a different error:

    <service-error>
      <status>
        <statusCode>AUTHENTICATION_ERROR</statusCode>
        <statusText>Client IP Address: xxx.yyy.aaa.bbb does not resolve to an account</statusText>
       </status>
    </service-error>
    

    If I use a random (wrong) API key from the university network, I see

    <service-error>
        <status>
            <statusCode>AUTHORIZATION_ERROR</statusCode>
            <statusText>APIKey <mad3upa1phanum3r1ck3y> with IP address <my.uni.IP.add> is unrecognized or has insufficient privileges for access to this resource</statusText>
        </status>
    </service-error>
    

    Debug steps

    As I can't replicate your problem - here are some diagnostic steps you can use to resolve:

    1. Use your browser at uni to actually submit the api query with your key in the URL (i.e. copy the URL above, paste it into the address bar, substitute your key and see whether you get the XML back)

    2. If 1 returns the XML you expect, move onto submitting the request via Python - first, copy the exact URL straight into Python (no variable substitution via %s, no apikey in the header) and simply do a .get() on it.

    3. If 2 returns correctly, ensure that your SCOPUS_API_KEY holds the exact key value, no more no less. i.e. print 'SCOPUS_API_KEY' should return your apikey: 4xxxxxxxxxxxxxxxxxxxxxxxxxxxxx43

    4. If 1 returns the error, it looks like your uni (for whatever reason) has not got access to the authors query API. This doesn't make much sense given that you can perform manual search, but that is all I can conclude

    Docs

    For reference the authentication algorithm documentation is here, but it is not very simple to follow. You are following authentication option 1 and your method should just work.

    N.B. The API is limited to 5000 author retrievals per week. If you have run a lot of queries in a loop, even if they have failed, it is possible that you have exceeded that...