Search code examples
pythonpython-3.xpython-requestsgrequests

How to throttle GET requests from a list of URLs


I have a list of ~250000 urls, that I need to get data from an API.

I have created a class using the grequests library to make asynchronous calls. However the API limit is 100 calls per second, which grequest surpasses.

Code using grequests:

import grequests

lst = ['url.com','url2.com']

class Test:
    def __init__(self):
        self.urls = lst

    def exception(self, request, exception):
        print ("Problem: {}: {}".format(request.url, exception))

    def async(self):
        return grequests.map((grequests.get(u) for u in self.urls), exception_handler=self.exception, size=100000)


    def collate_responses(self, results):
        return [x.text for x in results]

test = Test()
#here we collect the results returned by the async function
results = test.async()

Is there anyway I can use the requests library to make 100 calls per second?

I tried requests, but it times out after roughly 100000 calls.

In this case I am passing an ID into the URL.

import requests
L = [1,2,3]

for i in L:
    #print (row)
    url = 'url.com/Id={}'.format(i)
    xml_data1 = requests.get(url).text
    lst.append(xml_data1)
    time.sleep(1)
    print(xml_data1) 

Solution

  • Use multithreading.

    from multiprocessing.dummy import Pool as ThreadPool
    def some_fun(url):
        for i in L:
        #print (row)
        url = 'url.com/Id={}'.format(i)
        xml_data1 = requests.get(url).text
        lst.append(xml_data1)
        time.sleep(1)
        print(xml_data1) 
    
    if __name__ == '__main__':
        lst = ['url.com','url2.com']
        c_pool = ThreadPool(30) #add as many as threads you can
        c_pool.map(some_fun, lst)
        c_pool.close()
        c_pool.join()
    

    Cheers!