Adding callback function on each retry attempt using requests/urllib3

I've implemented a retry mechanism to requests session using urllib3.util.retry as suggested both here and here.

Now, I am trying to figure out what is the best way to add a callback function that will be called on every retry attempt.

To explain myself even more, if either the Retry object or the requests get method had a way to add a callback function, it would have been great. Maybe something like:

import requests
from requests.packages.urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter

def retry_callback(url):
    print url   

s = requests.Session()
retries = Retry(total=5, status_forcelist=[ 500, 502, 503, 504 ])
s.mount('http://', HTTPAdapter(max_retries=retries))

url = 'http://httpstat.us/500'
s.get(url, callback=retry_callback, callback_params=[url])

I know that for printing url I can use the logging, but this is only a simple example for a more complex use.

Solution

You can subclass the Retry class to add that functionality.

This is the full interaction flow with the Retry instance for a given connection attempt:

Retry.increment() is called with the current method, url, response object (if there is one), and exception (if one was raised) whenever an exception is raised, or a 30x redirection response was returned, or the Retry.is_retry() method returns true.
- .increment() will re-raise the error (if there was one) and the object was configured not to retry that specific class of errors.
- .increment() calls Retry.new() to create an updated instance, with any relevant counters updated and the history attribute amended with a new RequestHistory() instance (a named tuple).
- .increment() will raise a MaxRetryError exception if Retry.is_exhausted() called on the return value of Retry.new() is true. is_exhausted() returns true when any of the counters it tracks has dropped below 0 (counters set to None are ignored).
- .increment() returns the new Retry instance.
the return value of Retry.increment() replaces the old Retry instance tracked. If there was a redirect, then Retry.sleep_for_retry() is called (sleeping if there was a Retry-After header), otherwise Retry.sleep() is called (which calls self.sleep_for_retry() to honor a Retry-After header, otherwise just sleeping if there is a back-off policy). Then a recursive connection call is made with the new Retry instance.

This gives you 3 good callback points; at the start of .increment(), when creating the new Retry instance, and in a context manager around super().increment() to let a callback veto an exception or update the returned retry policy on exit.

This is what putting a hook on the start of .increment() would look like:

import logging

logger = getLogger(__name__)

class CallbackRetry(Retry):
    def __init__(self, *args, **kwargs):
        self._callback = kwargs.pop('callback', None)
        super(CallbackRetry, self).__init__(*args, **kwargs)
    def new(self, **kw):
        # pass along the subclass additional information when creating
        # a new instance.
        kw['callback'] = self._callback
        return super(CallbackRetry, self).new(**kw)
    def increment(self, method, url, *args, **kwargs):
        if self._callback:
            try:
                self._callback(url)
            except Exception:
                logger.exception('Callback raised an exception, ignoring')
        return super(CallbackRetry, self).increment(method, url, *args, **kwargs)

Note, the url argument is really only the URL path, the net location portion of the request is omitted (you'd have to extract that from the _pool argument, it has .scheme, .host and .port attributes).

Demo:

>>> def retry_callback(url):
...     print('Callback invoked with', url)
...
>>> s = requests.Session()
>>> retries = CallbackRetry(total=5, status_forcelist=[500, 502, 503, 504], callback=retry_callback)
>>> s.mount('http://', HTTPAdapter(max_retries=retries))
>>> s.get('http://httpstat.us/500')
Callback invoked with /500
Callback invoked with /500
Callback invoked with /500
Callback invoked with /500
Callback invoked with /500
Callback invoked with /500
Traceback (most recent call last):
  File "/.../lib/python3.6/site-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/.../lib/python3.6/site-packages/urllib3/connectionpool.py", line 732, in urlopen
    body_pos=body_pos, **response_kw)
  File "/.../lib/python3.6/site-packages/urllib3/connectionpool.py", line 732, in urlopen
    body_pos=body_pos, **response_kw)
  File "/.../lib/python3.6/site-packages/urllib3/connectionpool.py", line 732, in urlopen
    body_pos=body_pos, **response_kw)
  [Previous line repeated 1 more times]
  File "/.../lib/python3.6/site-packages/urllib3/connectionpool.py", line 712, in urlopen
    retries = retries.increment(method, url, response=response, _pool=self)
  File "<stdin>", line 8, in increment
  File "/.../lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='httpstat.us', port=80): Max retries exceeded with url: /500 (Caused by ResponseError('too many 500 error responses',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../lib/python3.6/site-packages/requests/sessions.py", line 521, in get
    return self.request('GET', url, **kwargs)
  File "/.../lib/python3.6/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/.../lib/python3.6/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/.../lib/python3.6/site-packages/requests/adapters.py", line 499, in send
    raise RetryError(e, request=request)
requests.exceptions.RetryError: HTTPConnectionPool(host='httpstat.us', port=80): Max retries exceeded with url: /500 (Caused by ResponseError('too many 500 error responses',))

Putting a hook in the .new() method would let you adjust the policy for a next attempt, as well as let you introspect the .history attribute, but would not let you avoid the exception re-raising.