Search code examples
pythonpython-requests.netrc

Python requests module slow due to high number of internal function calls


On one of my machines the Python module requests is extremely slow, seemingly due to a very high number of internal function calls. What could lead to such an issue?

>>> import sys, cProfile, requests, urllib3
>>> sys.version
'3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0]'
>>> requests.__version__
'2.28.1' # same issue with latest 2.29.0 release

>>> cProfile.run('requests.get("http://127.0.0.1:8080")', sort='ncalls')
        11717750 function calls (11717748 primitive calls) in 10.091 seconds

    Ordered by: call count

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   5220221    0.770   0.000     0.775    0.000 {method 'read' of '_io.TextIOWrapper' objects}
   5220221    0.801   0.000     0.801    0.000 shlex.py:68(punctuation_chars)
    468483    0.364   0.000     9.564    0.000 shlex.py:101(get_token)
    401557    7.612   0.000     9.188    0.000 shlex.py:133(read_token)
    200797    0.059   0.000     0.059    0.000 {method 'startswith' of 'str' objects}
     66931    0.012   0.000     0.012    0.000 {method 'popleft' of 'collections.deque' objects}
     66926    0.012   0.000     0.012    0.000 {method 'appendleft' of 'collections.deque' objects}
     66926    0.049   0.000     0.061    0.000 shlex.py:72(push_token)
      [..] # remaining calls with significantly lower numbers

>>> cProfile.run('http=urllib3.PoolManager(); http.request("GET", "http://127.0.0.1:8080")', sort='ncalls')
        784 function calls (783 primitive calls) in 0.005 seconds

On another machines, running the same version of Python and tested with same versions of requests, things look about right:

>>> cProfile.run('requests.get("http://127.0.0.1:8080")', sort='ncalls')
        2432 function calls (2396 primitive calls) in 0.005 seconds
  • Above tests were done using python3 -m http.server 8080 --bind 127.0.0.1, but same results for other HTTP servers
  • Most of the time spent is before a TCP connection is even created by looking at network traffic using tcpdump
  • As with urllib3, completely unrelated tools such as curl works just fine
  • While tests were made on IPv4 addresses, actively disabling IPv6 for requests using requests.packages.urllib3.util.connection.HAS_IPV6 = False did not have any effect either
  • Tested with clean virtualenvs

Solution

  • Using the Python profile viewer tuna I found that requests will attempt to get authentication credentials from a netrc file ($HOME/.netrc or $HOME/_netrc) if no other authentication method is specified:

    Instead of just looking at number of calls (ncalls), sorting by cumulative time (sort=cumtime) would also have helped highlight how the issuing of netrc.py and subsequent shlex.py were the main culprits.

    In my case I had a fairly large netrc file at thousands of entries. Upon trimming or removing the netrc file entirely the subsequent number of function calls shrunk accordingly.


    Ideally the performance of the netrc Python module wouldn't be such a bottleneck, as the file itself isn't all that complex, and other programs like curl has no issues handling it, but oh well.