Search code examples
pythoncurlpycurl

Python: Get the redirect urls using cURL


I am interested in getting the intermediate URLs in a redirect chain using pycURL. So, say I have a website, Site A, which redirects to Site B, which then redirects to Site C. Regularly I would only be able to see Site A (the starting URL) and Site C (the ending URL), however I am also interested in any sites that happen to reside in between the starting and ending site (in this case Site B). How would I go about doing this?


Solution

  • Have a look to PyCurl Callbacks:

    ## Callback function invoked when header data is ready
    def header(buf):
        import sys
        sys.stdout.write(buf)
        # Returning None implies that all bytes were written
    
    c = pycurl.Curl()
    c.setopt(pycurl.URL, "http://www.siteA.com/")
    c.setopt(pycurl.HEADERFUNCTION, header)
    c.perform()