I am trying to get data from scopus using api and python. I query using python module requests. The response of the query gets me a json with values like the following.
{ "search-results": { "opensearch:totalResults": "1186741", "opensearch:startIndex": "0", "opensearch:itemsPerPage": "25", "opensearch:Query": { "@role": "request", "@searchTerms": "all(machine learning)", "@startPage": "0" }, "link": [ { "@_fa": "true", "@ref": "self", "@href": "api query", "@type": "application/json" }, { "@_fa": "true", "@ref": "first", "@href": "api query", "@type": "application/json" }, { "@_fa": "true", "@ref": "next", "@href": "api query", "@type": "application/json" }, { "@_fa": "true", "@ref": "last", "@href": "api query", "@type": "application/json" } ], "entry": [ { "@_fa": "true", "link": [ { "@_fa": "true", "@ref": "self", "@href": "https://api.elsevier.com/content/abstract/scopus_id/85081889595" }, { "@_fa": "true", "@ref": "author-affiliation", "@href": "https://api.elsevier.com/content/abstract/scopus_id/85081889595?field=author,affiliation" }, { "@_fa": "true", "@ref": "scopus", "@href": "https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85081889595&origin=inward" }, { "@_fa": "true", "@ref": "scopus-citedby", "@href": "https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85081889595&origin=inward" } ], "prism:url": "https://api.elsevier.com/content/abstract/scopus_id/85081889595", "dc:identifier": "SCOPUS_ID:85081889595", "eid": "2-s2.0-85081889595", "dc:title": "Recognizing hotspots in Brief Eclectic Psychotherapy for PTSD by text and audio mining", "dc:creator": "Wiegersma S.", "prism:publicationName": "European Journal of Psychotraumatology", "prism:issn": "20008198", "prism:eIssn": "20008066", "prism:volume": "11", "prism:issueIdentifier": "1", "prism:pageRange": null, "prism:coverDate": "2020-12-31", "prism:coverDisplayDate": "31 December 2020", "prism:doi": "10.1080/20008198.2020.1726672", "citedby-count": "0", "affiliation": [ { "@_fa": "true", "affilname": "University of Twente", "affiliation-city": "Enschede", "affiliation-country": "Netherlands" } ], "prism:aggregationType": "Journal", "subtype": "ar", "subtypeDescription": "Article", "article-number": "1726672", "source-id": "21100394256", "openaccess": "1", "openaccessFlag": true },
However, the response is a nested json and I am not able to access the inner elements of it like the keys dc:creator, citedby-count etc.
Can anyone please help me with how to access all parts of it, like author name, cited by, affiliation etc. I want to store this result as csv which I can use for further manipulation.
Directly applying
df = pandas.read_json(file name)
doesn't yield correct result format: I get a table like this.
entry [{'@_fa': 'true', 'link': [{'@_fa': 'true', '@...
link [{'@_fa': 'true', '@ref': 'self', '@href': 'ht...
opensearch:Query {'@role': 'request', '@searchTerms': 'all(mach...
opensearch:itemsPerPage 25
opensearch:startIndex 0
opensearch:totalResults 1186741
I have also tried the accessing by nested dictionary to list to dictionary method, but at some point, I get stuck.
with open('data.json', encoding='utf-8') as access:
read_file = json.load(access)
…
type(read_file)
which is a dictionary so I follow syntax of dictionary to access further, and it converts to list at some point and dictionary again.
My main requirement is - **how to create a csv file with column headers which would be tags inside entry tag like dc:identifier, dc:title, dc:creator, citedby-count etc, and values within them ** enter code here
import json
dict_data = json.loads(response)
print(dict_data['key'])
Is this what you mean ?