My handler file
# -*- coding:utf-8 -*-
import sys
from tornado import gen, web, httpclient
url = "https://mdetail.tmall.com/templates/pages/desc?id=527485572414"
class SearchHandler(web.RequestHandler):
@gen.coroutine
def get(self):
async_client = httpclient.AsyncHTTPClient()
print sys.getrefcount(async_client) # The first time less than 10, then always bigger than 200
req = httpclient.HTTPRequest(url, "GET", headers=headers)
req_lists = [async_client.fetch(req) for _ in range(200)]
r = yield req_lists
print sys.getrefcount(async_client) # always bigger than 200
# The longer req_lists, the more memory will be consumed, and never decrease
configure file
tornado.httpclient.AsyncHTTPClient.configure(client, max_clients=1000)
if my client is "tornado.curl_httpclient.CurlAsyncHTTPClient", Then when I visit my handler in broswer, htop shows memory increase about 6GB,as long as the process running, memory usage never decrease
If I set range(200) to range(500) or higher, Memory usage grows higher
if my cline is None, memory barely increase
I found only fetch the https:// will have memory issue
How can I slove the momory problem with CurlAsyncHTTPClient?
Environment:
Ubuntu 16.10 x64
python2.7.12
Tornado 4.5.1
The reference counts you see are expected, because with max_clients=1000
, Tornado will cache and reuse 1000 pycurl.Curl
instances, each of which may hold a reference to the client’s _curl_header_callback
. You can see it with objgraph.show_backrefs
.
Do you really need max_clients=1000
— that is, up to 1000 requests in parallel? (I’m hoping they’re not all to the same server, as in your example!)
Anyway, the Curl
instances seem to be taking up a lot of memory.
On my system (Ubuntu 16.04), I can reproduce the problem when using PycURL linked against the system-wide libcurl3-gnutls 7.47.0:
$ /usr/bin/time -v python getter.py
6
207
^C
[...]
Maximum resident set size (kbytes): 4853544
When I link PycURL with a freshly built libcurl 7.54.1 (still with GnuTLS backend), I get a much better result:
$ LD_LIBRARY_PATH=$PWD/curl-prefix/lib /usr/bin/time -v python getter.py
6
207
^C
[...]
Maximum resident set size (kbytes): 1016084
And if I use libcurl with the OpenSSL backend, the result is better still:
Maximum resident set size (kbytes): 275572
There are other reports of memory problems with GnuTLS: curl issue #1086.
So, if you do need a large max_clients
, try using a newer libcurl built with the OpenSSL backend.