Search code examples
pythoncachingpython-requests

Python Requests-Cache still querying remote URL


I'm using the Requests-Cache library to cache results from Requests. It appears to install a cache just fine; requesting a URL creates a .sqlite cache file, and subsequent requests retrieve that data, even if the remote page changes.

My internet connection is rather poor today, and I noticed my script (which makes many (supposedly cached) requests) was running slowly. As a quick sanity check, I tried a test script to make a cache, then ran it again after disconnecting my computer from wifi. However, this errors out:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='magicplugin.normalitycomics.com', port=80): Max retries exceeded with url: /update/updatelist.txt (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x110390d68>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))

Why is the request even trying to connect to the remote site, if Requests-Cache is redirecting it to use the local cached data? Is there a way to avoid this? I don't need to slow down my script (particularly if my connection is poor) and make unnecessary requests from the server.


Solution

  • I figured it out!

    My actual code makes requests that sometimes successfully get pages, and sometimes get a 404.

    The only reason my simple test script replicated the problem was that I made a typo in the page I was requesting. Requests received a 404. Even though Requests-Cache created a cache file, it did not store this result in it.

    It turns out that by default, Requests-Cache only caches 200-code responses, but this is configurable:

    requests_cache.install_cache('example_cache', allowable_codes=(200, 404))
    

    And now it works fine!