Search code examples
pythonapiodata

Python: how to extract data from Odata API that contains pages @odata.nextLink


I need to pull data from an Odata API. With code below I do receive data, but only 250 rows.

The JSON contains a key called: @odata.nextLink that contains one value, this is the BASE_URL + endpoint + ?$skip=250

How can I loop through the next pages?

import requests
import pandas as pd
import json

BASE_URL = "base_url"


def session_token():
    url = BASE_URL + '/api/oauth/token'

    headers = {"Accept": "application\json",
                "Content-Type": "application/x-www-form-urlencoded;charset=UTF-8"}

    body = {"username":"user",
            "password": "pwd",
            "grant_type": "password"}

    return "Bearer "+ requests.post(url, headers = headers, data = body).json()["access_token"]

def make_request(endpoint, token = session_token()):
    headers = {"Authorization": token}
    response = requests.get(BASE_URL + endpoint, headers = headers)
    if response.status_code == 200:
        json_data = json.loads(response.text)
        return json_data


make_request("/odata/endpoint")

Following @Marek Piotrowski's advise I modified and came to a solution:

def main():
    url = "endpoint"
    while True:
        if not url:
            break
        response = make_request("endpoint")
        if response.status_code == 200:
            json_data = json.loads(response.text)
            url = json_data["@odata.nextLink"] # Fetch next link
            yield json_data['value']

result = pd.concat((json_normalize(row) for row in main()))
print(result) # Final dataframe, works like a charm :)

Solution

  • Something like that would retrieve all records, I believe (assuming there's @odata.nextLink in json_data indeed):

    def retrieve_all_records(endpoint, token = session_token()):
        all_records = []
        headers = {"Authorization": token}
        url = BASE_URL + endpoint
        while True:
            if not url:
                break
            response = requests.get(url, headers = headers)
            if response.status_code == 200:
                json_data = json.loads(response.text)
                all_records = all_records + json_data['records']
                url = json_data['@odata.nextLink']
        return all_records
    

    The code is untested, though. Let me know if it works. Alternatively, you could make some recursive call to make_request, I believe, but you'd have to store results somewhere above the function itself then.