Search code examples
pythonjsonapipython-requestschunking

Using chunk in json, requests to get large data into python


I am trying to get a large data into python using API. But I am not being able to get the entire data. The request is allowing only first 1000 lines to be retrieved.

r = requests.get("https://data.cityofchicago.org/resource/6zsd-86xi.json")

json=r.json()
df=pd.DataFrame(json)
df.drop(df.columns[[0,1,2,3,4,5,6,7]], axis=1, inplace=True) #dropping some columns
df.shape

Output is

(1000,22)

The website contains almost 6 million data points. Yet only 1000 are retrieved. How do I get around this? Is chunking right option? Can someone please help me with the code?

Thanks.


Solution

  • You'll need to paginate through the results to get the entire dataset. Most APIs will limit the amount of results returned in a single request. According to the Socrata docs you need to add $limit and $offset parameters to the request url.

    For example, for the first page of results you would start with - https://data.cityofchicago.org/resource/6zsd-86xi.json?$limit=1000&$offset=0

    Then for the next page you would just increment the offset - https://data.cityofchicago.org/resource/6zsd-86xi.json?$limit=1000&$offset=1000

    Continue incrementing until you have the entire dataset.