Search code examples
pythonhttptwiki

Accessing TWiki page with Python http.client


I'm trying to access my local TWiki installation with python http.client. For some reason I always end up with 403 Forbidden. I can access other sub folders in my server, but not twiki. I can access this TWiki page with curl. Is there something special you need to do when accessing /bin/ or /cgi-bin/ folders with python http.client?

Here is example with twiki.org pages, because my localhost is not accessible outside:

>>> import httplib
>>> conn = httplib.HTTPConnection("twiki.org")
>>> conn.request("GET", "/cgi-bin/view/")
>>> r1 = conn.getresponse()
>>> print r1.status, r1.reason
403 Forbidden
>>> data1 = r1.read()
>>> data1
'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>403 Forbidden</title>\n</head><body>\n<h1>Forbidden</h1>\n<p>You don\'t have permission to access /cgi-bin/view/\non this server.</p>\n<hr>\n<address>Apache/2.2.3 (CentOS) Server at twiki.org Port 80</address>\n</body></html>\n'
>>> 

Solution

  • I just tried this myself and I found that setting a User-Agent header seemed to fix it. It didn't seem to matter what the header was, simply that it was set:

    >>> import httplib
    >>> conn = httplib.HTTPConnection("twiki.org")
    >>> conn.request("GET", "/cgi-bin/view/", headers={"User-Agent": "foo"})
    >>> r1 = conn.getresponse()
    >>> print r1.status, r1.reason
    200 OK
    

    Unfortunately I can't shed any light on why Twiki returns a 403 without a User-Agent header - I just tried it on the basis that it's one of the likely differences between clients. I assume it's something like the fact that it's trying to decide whether to return the mobile version of the site, but it's really poor not to handle the case of no header gracefully.

    Hopefully that at least provides a work-around for you, however.

    EDIT

    Apparently this is part of their default Apache config using the BrowserMatchNoCase directive to set an environment variable blockAccess which is presumably picked up later to return the observed 403 Forbidden response.

    They seem to think that this prevents DoS attacks somehow, although I'm really unconvinced by anything that can be worked around by simply setting a random User-Agent string. As you can tell from that config, they also have a list of "known bad" user agents they attempt to block. You can observe this by attempting to use one of them to fetch from the command-line:

    $ GET -Ssed -H "User-Agent: some-random-name" http://twiki.org/cgi-bin/view/
    GET http://twiki.org/cgi-bin/view/
    200 OK
    [...]
    $ GET -Ssed -H "User-Agent: FAST" http://twiki.org/cgi-bin/view/
    GET http://twiki.org/cgi-bin/view/
    403 Forbidden
    [...]
    

    I'm sure they have their reasons for doing this, but I must say I'm not impressed.