Search code examples
pythonpython-2.7curlnetwork-programmingpycurl

How to keep an inactive connection open with PycURL?


Pseudo-code to better explain question:

#!/usr/bin/env python2.7
import pycurl, threading

def threaded_work():
    conn = pycurl.Curl()
    conn.setopt(pycurl.TIMEOUT, 10)

    # Make a request to host #1 just to open the connection to it.
    conn.setopt(pycurl.URL, 'https://host1.example.com/')
    conn.perform_rs()

    while not condition_that_may_take_very_long:
        conn.setopt(pycurl.URL, 'https://host2.example.com/')
        print 'Response from host #2: ' + conn.perform_rs()

    # Now, after what may be a very long time, we must request host #1 again with a (hopefully) already established connection.
    conn.setopt(pycurl.URL, 'https://host1.example.com/')
    print 'Response from host #1, hopefully with an already established connection from above: ' + conn.perform_rs()
    conn.close()

for _ in xrange(30):
    # Multiple threads must work with host #1 and host #2 individually.
    threading.Thread(target = threaded_work).start()

I am omitting extra, only unnecessary details for brevity so that the main problem has focus.

As you can see, I have multiple threads that must work with two different hosts, host #1 and host #2. Mostly, the threads will be working with host #2 until a certain condition is met. That condition may take hours or even longer to be met, and will be met at different times in different threads. Once the condition (condition_that_may_take_very_long in the example) is met, I would like host #1 to be requested as fast as possible with the connection that I have already established at the start of the threaded_work method. Is there any efficient way to efficiently accomplish this (open to the suggestion of using two PycURL handles, too)?


Solution

  • Pycurl uses libcurl. libcurl keeps connections alive by default after use, so as long as you keep the handle alive and use that for the subsequent transfer, it will keep the connection alive and ready for reuse.

    However, due to modern networks and network equipment (NATs, firewalls, web servers), connections without traffic are often killed off relatively soon so having an idle connection and expecting it to actually work after "hours", is a very slim chance and rare occurance. Typically, libcurl will then discover that the connection has been killed in the mean time and create a new one to use at the next use.

    Additionally, and in line with what I've described above, since libcurl 7.65.0 it now defaults to not reusing connections anymore that are older than 118 seconds. Changeable with the CURLOPT_MAXAGE_CONN option. The reason is that they barely ever work so by avoiding having to keep them around, detect them to be dead and reissue the request, this is an optimization.