Search code examples
pythonjsonapipython-requestspython-requests-html

NeuroMorpho.org - getting results from multiple API pages


Sorry in advance since this is my first post and I'm totally new to Python coding. I want to use the NeuroMorpho API (http://neuromorpho.org/apiReference.html) to find and get information about certain neurons (added the filters in the query line).

I used the following codes:

import requests
import json
import csv
import pandas as pd
from pandas import DataFrame
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

response = requests.get("http://neuromorpho.org/api")
response

query = (
    "http://neuromorpho.org/api/neuron/select?q=species:rat&fq=brain_region:hippocampus, CA1&fq=experiment_condition:Control&fq=cell_type:Pyramidal, principal cell"
)

response = requests.get(query)
json_data = response.json()
rat_data = json_data
rat_data

I get a a large amount of data and all the way at the end it says the following:

'page': {'size': 50, 'totalElements': 1115, 'totalPages': 23, 'number': 0}}

Then I wanted to create a dictionary from that data and used the following code:

df_dict = {}
df_dict['NeuronID'] = []
df_dict['Archive'] = []
df_dict['Strain'] = []
df_dict['Cell'] = []
df_dict['Region'] = []
for i in rat_data['_embedded']['neuronResources']:
    df_dict['NeuronID'].append(str(i['neuron_id']))
    df_dict['Archive'].append(str(i['archive']))
    df_dict['Strain'].append(str(i['strain']))
    df_dict['Cell'].append(str(i['cell_type']))
    df_dict['Region'].append(str(i['brain_region']))

rat_df = DataFrame(df_dict)
print(rat_df)

And at the end when i checked the length of the dictionary:

len(rat_df)

The output was 50.

So I figured at the end that the program pulled only first 50 neurons from the first (page 0). I still have 23 pages left according to that output in the begining. How can i put all of those results into one dictionary or class, i.e. is there any way to list through all of those pages? I have tried with several loop options but didn't had any success.

Sorry if this is an easy question or if I have made some mistake, but I have been trying everything for the past couple of days and I'm not getting any result.


Solution

  • Disclaimer: I'm not an expert with HTTP or the Requests library and didn't use neuromorpho.org before, so please take this with a grain of salt.

    You can query the number of pages with the first request and afterwards loop through the individual pages. In the loop you have to include the requested page as parameter to the HTTP GET-Method, e.g. ?page=42&..., like this:

    url = 'http://neuromorpho.org/api/neuron/select'
    params = {
            'page' : 0,
            'q' : 'species:rat',
            'fq' : [
                'brain_region:hippocampus,CA1',
                'experiment_condition:Control',
                'cell_type:Pyramidal,principal cell' ] }
    
    totalPages = requests.get(url, params).json()['page']['totalPages']
    
    df_dict = {
            'NeuronID' : list(),
            'Archive' : list(),
            'Strain' :  list(),
            'Cell' : list(),
            'Region' : list() }
    
    for pageNum in range(totalPages):
        params['page'] = pageNum
        response = requests.get(url, params)
        print('Querying page {} -> status code: {}'.format(
            pageNum, response.status_code))
        if (response.status_code == 200):    #only parse successful requests
            data = response.json()
            for row in data['_embedded']['neuronResources']:
                df_dict['NeuronID'].append(str(row['neuron_id']))
                df_dict['Archive'].append(str(row['archive']))
                df_dict['Strain'].append(str(row['strain']))
                df_dict['Cell'].append(str(row['cell_type']))
                df_dict['Region'].append(str(row['brain_region']))
    
    rat_df = pd.DataFrame(df_dict)
    print(rat_df)
    

    You can see the resulting DataFrame and how the requested page number changes in the console output:

    Querying page 0 -> status code: 200
    Querying page 1 -> status code: 200
    Querying page 2 -> status code: 200
    Querying page 3 -> status code: 200
    Querying page 4 -> status code: 200
    Querying page 5 -> status code: 200
    Querying page 6 -> status code: 200
    Querying page 7 -> status code: 200
    Querying page 8 -> status code: 200
    Querying page 9 -> status code: 200
    Querying page 10 -> status code: 200
    Querying page 11 -> status code: 200
    Querying page 12 -> status code: 200
    Querying page 13 -> status code: 200
    Querying page 14 -> status code: 200
    Querying page 15 -> status code: 200
    Querying page 16 -> status code: 200
    Querying page 17 -> status code: 200
    Querying page 18 -> status code: 200
    Querying page 19 -> status code: 200
    Querying page 20 -> status code: 200
    Querying page 21 -> status code: 200
    Querying page 22 -> status code: 200
         NeuronID    Archive          Strain                             Cell                          Region
    0         100     Turner     Fischer 344  ['pyramidal', 'principal cell']          ['hippocampus', 'CA1']
    1         101     Turner     Fischer 344  ['pyramidal', 'principal cell']          ['hippocampus', 'CA1']
    2        1016     Ascoli  Sprague-Dawley  ['pyramidal', 'principal cell']                 ['hippocampus']
    3        1019     Ascoli  Sprague-Dawley  ['pyramidal', 'principal cell']                 ['hippocampus']
    4         102     Turner     Fischer 344  ['pyramidal', 'principal cell']          ['hippocampus', 'CA1']
    ...       ...        ...             ...                              ...                             ...
    1110    99614  Guizzetti  Sprague-Dawley  ['principal cell', 'pyramidal']  ['hippocampus', 'CA1', 'left']
    1111    99615  Guizzetti  Sprague-Dawley  ['principal cell', 'pyramidal']  ['hippocampus', 'CA1', 'left']
    1112    99616  Guizzetti  Sprague-Dawley  ['principal cell', 'pyramidal']  ['hippocampus', 'CA1', 'left']
    1113    99617  Guizzetti  Sprague-Dawley  ['principal cell', 'pyramidal']  ['hippocampus', 'CA1', 'left']
    1114    99618  Guizzetti  Sprague-Dawley  ['principal cell', 'pyramidal']  ['hippocampus', 'CA1', 'left']
    
    [1115 rows x 5 columns]
    
    

    Update #1:

    I changed my posted code by adding a modified version of your code for parsing the responses in the loop. I think there is a small bug in the neuromorpho.org API, as it responds with size: 50 for the last page (number 22), while it contains only 15 (index 0-14) objects in JSON response. You can circumvent that issue by iterating over the JSON object and neglecting the reported size.

    Update #2:

    Realized that the GET parameters don't have to be encoded in the URL but that Requests does that for us when passing them as a dict (updated the code).

    I hope this helps!