I am trying to get a large data into python using API. But I am not being able to get the entire data. The request is allowing only first 1000 lines to be retrieved.
r = requests.get("https://data.cityofchicago.org/resource/6zsd-86xi.json")
json=r.json()
df=pd.DataFrame(json)
df.drop(df.columns[[0,1,2,3,4,5,6,7]], axis=1, inplace=True) #dropping some columns
df.shape
Output is
(1000,22)
The website contains almost 6 million data points. Yet only 1000 are retrieved. How do I get around this? Is chunking right option? Can someone please help me with the code?
Thanks.
You'll need to paginate through the results to get the entire dataset. Most APIs will limit the amount of results returned in a single request. According to the Socrata docs you need to add $limit
and $offset
parameters to the request url.
For example, for the first page of results you would start with -
https://data.cityofchicago.org/resource/6zsd-86xi.json?$limit=1000&$offset=0
Then for the next page you would just increment the offset -
https://data.cityofchicago.org/resource/6zsd-86xi.json?$limit=1000&$offset=1000
Continue incrementing until you have the entire dataset.