Search code examples
pythonjsoneuropepmc

Formatting JSON GET results in Python


I am trying to get Covid-19 JSON data from Europe Pubmed Central. The JSON results returned by Europe PMC server looks like this.

My initial code querying the server looks like this:

import requests
import json


mydata = "https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=(%E2%80%9C2019-nCoV%E2%80%9D)&format=json"

#get Server response
reply = requests.get(mydata)

#print out results
print(reply.json())

I wish to get rid of these part of the JSON:

{'version': '6.2', 'hitCount': 847, 'nextCursorMark': 'AoIIQVJxdCg0MTI2NjU3Mw==', 'request': {'queryString': '(“2019-nCoV”)', 'resultType': 'lite', 'cursorMark': '*', 'pageSize': 25, 'sort': '', 'synonym': False}, 'resultList':

How can i get rid of this part in python? I apologize in advance for the long url querystring.


Solution

  • I would recommend simply doing

    reply = reply['resultList']
    

    Then reply would consist only of

    {
    "result": [
      {
        "id": "32036774",
        "source": "MED",
        "pmid": "32036774",
        "pmcid": "PMC7054940",
        "doi": "10.1080/01652176.2020.1727993",
        "title": "Emerging novel coronavirus (2019-nCoV)-current scenario, evolutionary perspective based on genome analysis and recent developments.",
        "authorString": "Malik YS, Sircar S, Bhat S, Sharun K, Dhama K, Dadar M, Tiwari R, Chaicumpa W.",
        "journalTitle": "Vet Q",
        "issue": "1",
        "journalVolume": "40",
        "pubYear": "2020",
        "journalIssn": "0165-2176; 1875-5941; ",
        "pageInfo": "68-76",
        "pubType": "other; review; journal article",
        "isOpenAccess": "Y",
        "inEPMC": "Y",
        "inPMC": "N",
        "hasPDF": "Y",
        "hasBook": "N",
        "hasSuppl": "Y",
        "citedByCount": 0,
        "hasReferences": "N",
        "hasTextMinedTerms": "Y",
        "hasDbCrossReferences": "N",
        "hasLabsLinks": "Y",
        "hasTMAccessionNumbers": "Y",
        "tmAccessionTypeList": {
          "accessionType": [
            "gen"
          ]
        },
        "firstIndexDate": "2020-02-11",
        "firstPublicationDate": "2020-12-01"
      }, ...
      ]
    }
    

    From there you could iterate over all object like so

    for result in reply['results']:
        print(result)