Sorry in advance since this is my first post and I'm totally new to Python coding. I want to use the NeuroMorpho API (http://neuromorpho.org/apiReference.html) to find and get information about certain neurons (added the filters in the query line).
I used the following codes:
import requests
import json
import csv
import pandas as pd
from pandas import DataFrame
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
response = requests.get("http://neuromorpho.org/api")
response
query = (
"http://neuromorpho.org/api/neuron/select?q=species:rat&fq=brain_region:hippocampus, CA1&fq=experiment_condition:Control&fq=cell_type:Pyramidal, principal cell"
)
response = requests.get(query)
json_data = response.json()
rat_data = json_data
rat_data
I get a a large amount of data and all the way at the end it says the following:
'page': {'size': 50, 'totalElements': 1115, 'totalPages': 23, 'number': 0}}
Then I wanted to create a dictionary from that data and used the following code:
df_dict = {}
df_dict['NeuronID'] = []
df_dict['Archive'] = []
df_dict['Strain'] = []
df_dict['Cell'] = []
df_dict['Region'] = []
for i in rat_data['_embedded']['neuronResources']:
df_dict['NeuronID'].append(str(i['neuron_id']))
df_dict['Archive'].append(str(i['archive']))
df_dict['Strain'].append(str(i['strain']))
df_dict['Cell'].append(str(i['cell_type']))
df_dict['Region'].append(str(i['brain_region']))
rat_df = DataFrame(df_dict)
print(rat_df)
And at the end when i checked the length of the dictionary:
len(rat_df)
The output was 50.
So I figured at the end that the program pulled only first 50 neurons from the first (page 0). I still have 23 pages left according to that output in the begining. How can i put all of those results into one dictionary or class, i.e. is there any way to list through all of those pages? I have tried with several loop options but didn't had any success.
Sorry if this is an easy question or if I have made some mistake, but I have been trying everything for the past couple of days and I'm not getting any result.
Disclaimer: I'm not an expert with HTTP or the Requests library and didn't use neuromorpho.org before, so please take this with a grain of salt.
You can query the number of pages with the first request and afterwards loop through the individual pages. In the loop you have to include the requested page as parameter to the HTTP GET-Method, e.g. ?page=42&...
, like this:
url = 'http://neuromorpho.org/api/neuron/select'
params = {
'page' : 0,
'q' : 'species:rat',
'fq' : [
'brain_region:hippocampus,CA1',
'experiment_condition:Control',
'cell_type:Pyramidal,principal cell' ] }
totalPages = requests.get(url, params).json()['page']['totalPages']
df_dict = {
'NeuronID' : list(),
'Archive' : list(),
'Strain' : list(),
'Cell' : list(),
'Region' : list() }
for pageNum in range(totalPages):
params['page'] = pageNum
response = requests.get(url, params)
print('Querying page {} -> status code: {}'.format(
pageNum, response.status_code))
if (response.status_code == 200): #only parse successful requests
data = response.json()
for row in data['_embedded']['neuronResources']:
df_dict['NeuronID'].append(str(row['neuron_id']))
df_dict['Archive'].append(str(row['archive']))
df_dict['Strain'].append(str(row['strain']))
df_dict['Cell'].append(str(row['cell_type']))
df_dict['Region'].append(str(row['brain_region']))
rat_df = pd.DataFrame(df_dict)
print(rat_df)
You can see the resulting DataFrame
and how the requested page number changes in the console output:
Querying page 0 -> status code: 200
Querying page 1 -> status code: 200
Querying page 2 -> status code: 200
Querying page 3 -> status code: 200
Querying page 4 -> status code: 200
Querying page 5 -> status code: 200
Querying page 6 -> status code: 200
Querying page 7 -> status code: 200
Querying page 8 -> status code: 200
Querying page 9 -> status code: 200
Querying page 10 -> status code: 200
Querying page 11 -> status code: 200
Querying page 12 -> status code: 200
Querying page 13 -> status code: 200
Querying page 14 -> status code: 200
Querying page 15 -> status code: 200
Querying page 16 -> status code: 200
Querying page 17 -> status code: 200
Querying page 18 -> status code: 200
Querying page 19 -> status code: 200
Querying page 20 -> status code: 200
Querying page 21 -> status code: 200
Querying page 22 -> status code: 200
NeuronID Archive Strain Cell Region
0 100 Turner Fischer 344 ['pyramidal', 'principal cell'] ['hippocampus', 'CA1']
1 101 Turner Fischer 344 ['pyramidal', 'principal cell'] ['hippocampus', 'CA1']
2 1016 Ascoli Sprague-Dawley ['pyramidal', 'principal cell'] ['hippocampus']
3 1019 Ascoli Sprague-Dawley ['pyramidal', 'principal cell'] ['hippocampus']
4 102 Turner Fischer 344 ['pyramidal', 'principal cell'] ['hippocampus', 'CA1']
... ... ... ... ... ...
1110 99614 Guizzetti Sprague-Dawley ['principal cell', 'pyramidal'] ['hippocampus', 'CA1', 'left']
1111 99615 Guizzetti Sprague-Dawley ['principal cell', 'pyramidal'] ['hippocampus', 'CA1', 'left']
1112 99616 Guizzetti Sprague-Dawley ['principal cell', 'pyramidal'] ['hippocampus', 'CA1', 'left']
1113 99617 Guizzetti Sprague-Dawley ['principal cell', 'pyramidal'] ['hippocampus', 'CA1', 'left']
1114 99618 Guizzetti Sprague-Dawley ['principal cell', 'pyramidal'] ['hippocampus', 'CA1', 'left']
[1115 rows x 5 columns]
I changed my posted code by adding a modified version of your code for parsing the responses in the loop. I think there is a small bug in the neuromorpho.org API, as it responds with size: 50
for the last page (number 22), while it contains only 15 (index 0-14) objects in JSON response. You can circumvent that issue by iterating over the JSON object and neglecting the reported size.
Realized that the GET parameters don't have to be encoded in the URL but that Requests does that for us when passing them as a dict
(updated the code).
I hope this helps!