I am passing an id value to a API url to get JSON response, but getting only one response and rest all are throwing 500 errors. I collect the ids in a list and pass the id to API URL as a parameter in a while loop to extract the data.
###Get id in a variable##
df_filter=spark.sql("""select distinct ID from filter_view""")
rdd = df_filter.rdd
listOfRows = rdd.collect()
counter = 0
##total_results = []
while counter < len(listOfRows):
url += '?ids=' + listOfRows[counter].ID
response = requests.get(url,headers=headers)
if response.status_code == 200:
json_response = response.json()
##total_results.append(json_response)
df2 = pd.json_normalize(json_response, record_path=['entities'])
display(df2)
else:
print("Error: HTTP status code " + str(response.status_code))
counter +=1
I am getting output for only one ID and rest all end with 500 errors.
Desired output:
ID---ItemID--Details
1 100 text
1 101 text
2 200 text
2 300 text
3 400 sometext
3 500 sometext
Output I am getting:
ID---ItemID--Details
1 100 text
1 101 text
Error: HTTP status code 500
Error: HTTP status code 500
Error: HTTP status code 500
Error: HTTP status code 500
Error: HTTP status code 500
Error: HTTP status code 500
The first iteration produces a valid URL: baseURL/?ids=1
, but since it's built using concatenation and assignment, the second iteration produces baseURL/?ids=1?ids=2
when you want baseURL/?ids=2
.
while counter < len(listOfRows):
response = requests.get(f'{url}?ids={listOfRows[counter].ID}', headers=headers)
Does the API support GETting multiple resources in a single request? Typically, with a plural query parameter like ids
, it will take either a comma-separated list of resource IDs (?ids=1,2,3
) or an array (?ids[]=1&ids[]=2&ids[]=3
, or ?ids=1&ids=2&ids=3
). If so, it will be way more efficient, and more polite to the API provider, to make one such request.
response = requests.get(
url + '?ids=' + ','.join([row.ID for row in listOfRows]),
headers=headers
)
You'll probably need to change the code to parse the new response.
If multiple GET isn't supported, at least convert this to a for-loop. There's no need to keep track of counter
and test counter < len(listOfRows)
, and it will improve readability.
df_filter=spark.sql("""select distinct ID from filter_view""")
rdd = df_filter.rdd
listOfRows = rdd.collect()
for row in listOfRows:
response = requests.get(f'{url}?ids={row.ID}', headers=headers)
if response.status_code == 200:
json_response = response.json()
df2 = pd.json_normalize(json_response, record_path=['entities'])
display(df2)
else:
print("Error: HTTP status code " + str(response.status_code))
Update: based on comment
i have over 5000 ids that needs to be passed one by one. How can this be passed in a chunks of 20 each may be?
Build URLs of the form ...?ids=1&ids=2&ids=3...
with no more than 20 ids per URL.
from itertools import islice
def chunker(it: seq, chunksize):
iterator = iter(it)
while chunk := list(islice(iterator, chunksize)):
yield chunk
for id_chunk in chunker([row.ID for row in listOfRows], 20):
response = requests.get(
f'{url}?ids=' + '&ids='.join(id_chunk),
headers=headers
)
The chunker()
will split an iterable into list
s with length <= chunksize
. First filter listOfRows
for just the IDs. Then chunk the IDs into list
s of length 20. Build the URL and make the request.
Thank you kafran for chunker()
.