I'm very new to Python so not sure if this can be done but I hope it can!
I have accessed the Scopus API and managed to run a search query which gives me the following results in a pandas dataframe:
search-results
entry [{'@_fa': 'true', 'affiliation': [{'@_fa': 'tr...
link [{'@_fa': 'true', '@ref': 'self', '@type': 'ap...
opensearch:Query {'@role': 'request', '@searchTerms': 'AFFIL(un...
opensearch:itemsPerPage 200
opensearch:startIndex 0
opensearch:totalResults 106652
If possible, I'd like to export the 106652 results into a csv file so that they can be analysed. Is this possible at all?
first you need to get all the results (see comments under question). The data you need (search results) is inside the "entry" list. You can extract that list and append it to a support list, iterating until you got all the results. Here i cycle and at every round i subtract the downloaded items (count) from the total number of results.
found_items_num = 1
start_item = 0
items_per_query = 25
max_items = 2000
JSON = []
print ('GET data from Search API...')
while found_items_num > 0:
resp = requests.get(self._url,
headers={'Accept': 'application/json', 'X-ELS-APIKey': MY_API_KEY},
params={'query': query, 'view': view, 'count': items_per_query,
'start': start_item})
print ('Current query url:\n\t{}\n'.format(resp.url))
if resp.status_code != 200:
# error
raise Exception('ScopusSearchApi status {0}, JSON dump:\n{1}\n'.format(resp.status_code, resp.json()))
# we set found_items_num=1 at initialization, on the first call it has to be set to the actual value
if found_items_num == 1:
found_items_num = int(resp.json().get('search-results').get('opensearch:totalResults'))
print ('GET returned {} articles.'.format(found_items_num))
if found_items_num == 0:
pass
else:
# write fetched JSON data to a file.
out_file = os.path.join(str(start_item) + '.json')
with open(out_file, 'w') as f:
json.dump(resp.json(), f, indent=4)
f.close()
# check if results number exceed the given limit
if found_items_num > max_items:
print('WARNING: too many results, truncating to {}'.format(max_items))
found_items_num = max_items
# check if returned some result
if 'entry' in resp.json().get('search-results', []):
# combine entries to make a single JSON
JSON += resp.json()['search-results']['entry']
# set counters for the next cycle
self._found_items_num -= self._items_per_query
self._start_item += self._items_per_query
print ('Still {} results to be downloaded'.format(self._found_items_num if self._found_items_num > 0 else 0))
# end while - finished downloading JSON data
then, outside the while, you can save the complete file like this...
out_file = os.path.join('articles.json')
with open(out_file, 'w') as f:
json.dump(JSON, f, indent=4)
f.close()
or you can follow this guide i found online(not tested, you can search 'json to cvs python' and you get many guides) to convert the json data to a csv