Search code examples
pythoncallbackpython-requestsurllib3

Adding callback function on each retry attempt using requests/urllib3


I've implemented a retry mechanism to requests session using urllib3.util.retry as suggested both here and here.

Now, I am trying to figure out what is the best way to add a callback function that will be called on every retry attempt.

To explain myself even more, if either the Retry object or the requests get method had a way to add a callback function, it would have been great. Maybe something like:

import requests
from requests.packages.urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter

def retry_callback(url):
    print url   

s = requests.Session()
retries = Retry(total=5, status_forcelist=[ 500, 502, 503, 504 ])
s.mount('http://', HTTPAdapter(max_retries=retries))

url = 'http://httpstat.us/500'
s.get(url, callback=retry_callback, callback_params=[url])

I know that for printing url I can use the logging, but this is only a simple example for a more complex use.


Solution

  • You can subclass the Retry class to add that functionality.

    This is the full interaction flow with the Retry instance for a given connection attempt:

    • Retry.increment() is called with the current method, url, response object (if there is one), and exception (if one was raised) whenever an exception is raised, or a 30x redirection response was returned, or the Retry.is_retry() method returns true.
      • .increment() will re-raise the error (if there was one) and the object was configured not to retry that specific class of errors.
      • .increment() calls Retry.new() to create an updated instance, with any relevant counters updated and the history attribute amended with a new RequestHistory() instance (a named tuple).
      • .increment() will raise a MaxRetryError exception if Retry.is_exhausted() called on the return value of Retry.new() is true. is_exhausted() returns true when any of the counters it tracks has dropped below 0 (counters set to None are ignored).
      • .increment() returns the new Retry instance.
    • the return value of Retry.increment() replaces the old Retry instance tracked. If there was a redirect, then Retry.sleep_for_retry() is called (sleeping if there was a Retry-After header), otherwise Retry.sleep() is called (which calls self.sleep_for_retry() to honor a Retry-After header, otherwise just sleeping if there is a back-off policy). Then a recursive connection call is made with the new Retry instance.

    This gives you 3 good callback points; at the start of .increment(), when creating the new Retry instance, and in a context manager around super().increment() to let a callback veto an exception or update the returned retry policy on exit.

    This is what putting a hook on the start of .increment() would look like:

    import logging
    
    logger = getLogger(__name__)
    
    class CallbackRetry(Retry):
        def __init__(self, *args, **kwargs):
            self._callback = kwargs.pop('callback', None)
            super(CallbackRetry, self).__init__(*args, **kwargs)
        def new(self, **kw):
            # pass along the subclass additional information when creating
            # a new instance.
            kw['callback'] = self._callback
            return super(CallbackRetry, self).new(**kw)
        def increment(self, method, url, *args, **kwargs):
            if self._callback:
                try:
                    self._callback(url)
                except Exception:
                    logger.exception('Callback raised an exception, ignoring')
            return super(CallbackRetry, self).increment(method, url, *args, **kwargs)
    

    Note, the url argument is really only the URL path, the net location portion of the request is omitted (you'd have to extract that from the _pool argument, it has .scheme, .host and .port attributes).

    Demo:

    >>> def retry_callback(url):
    ...     print('Callback invoked with', url)
    ...
    >>> s = requests.Session()
    >>> retries = CallbackRetry(total=5, status_forcelist=[500, 502, 503, 504], callback=retry_callback)
    >>> s.mount('http://', HTTPAdapter(max_retries=retries))
    >>> s.get('http://httpstat.us/500')
    Callback invoked with /500
    Callback invoked with /500
    Callback invoked with /500
    Callback invoked with /500
    Callback invoked with /500
    Callback invoked with /500
    Traceback (most recent call last):
      File "/.../lib/python3.6/site-packages/requests/adapters.py", line 440, in send
        timeout=timeout
      File "/.../lib/python3.6/site-packages/urllib3/connectionpool.py", line 732, in urlopen
        body_pos=body_pos, **response_kw)
      File "/.../lib/python3.6/site-packages/urllib3/connectionpool.py", line 732, in urlopen
        body_pos=body_pos, **response_kw)
      File "/.../lib/python3.6/site-packages/urllib3/connectionpool.py", line 732, in urlopen
        body_pos=body_pos, **response_kw)
      [Previous line repeated 1 more times]
      File "/.../lib/python3.6/site-packages/urllib3/connectionpool.py", line 712, in urlopen
        retries = retries.increment(method, url, response=response, _pool=self)
      File "<stdin>", line 8, in increment
      File "/.../lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment
        raise MaxRetryError(_pool, url, error or ResponseError(cause))
    urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='httpstat.us', port=80): Max retries exceeded with url: /500 (Caused by ResponseError('too many 500 error responses',))
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/.../lib/python3.6/site-packages/requests/sessions.py", line 521, in get
        return self.request('GET', url, **kwargs)
      File "/.../lib/python3.6/site-packages/requests/sessions.py", line 508, in request
        resp = self.send(prep, **send_kwargs)
      File "/.../lib/python3.6/site-packages/requests/sessions.py", line 618, in send
        r = adapter.send(request, **kwargs)
      File "/.../lib/python3.6/site-packages/requests/adapters.py", line 499, in send
        raise RetryError(e, request=request)
    requests.exceptions.RetryError: HTTPConnectionPool(host='httpstat.us', port=80): Max retries exceeded with url: /500 (Caused by ResponseError('too many 500 error responses',))
    

    Putting a hook in the .new() method would let you adjust the policy for a next attempt, as well as let you introspect the .history attribute, but would not let you avoid the exception re-raising.