Search code examples
pythonhttptwistedpacket

How to detect HTTP Request in python + twisted?


I am learning network programming using twisted 10 in python. In below code is there any way to detect HTTP Request when data recieved? also retrieve Domain name, Sub Domain, Port values from this? Discard it if its not http data?

from twisted.internet import stdio, reactor, protocol

from twisted.protocols import basic

import re



class DataForwardingProtocol(protocol.Protocol):

    def _ _init_ _(self):

        self.output = None

        self.normalizeNewlines = False



    def dataReceived(self, data):

        if self.normalizeNewlines:

            data = re.sub(r"(\r\n|\n)", "\r\n", data)

        if self.output:

            self.output.write(data)



class StdioProxyProtocol(DataForwardingProtocol):

    def connectionMade(self):

        inputForwarder = DataForwardingProtocol( )

        inputForwarder.output = self.transport

        inputForwarder.normalizeNewlines = True

        stdioWrapper = stdio.StandardIO(inputForwarder)

        self.output = stdioWrapper

        print "Connected to server.  Press ctrl-C to close connection."



class StdioProxyFactory(protocol.ClientFactory):

    protocol = StdioProxyProtocol



    def clientConnectionLost(self, transport, reason):

        reactor.stop( )



    def clientConnectionFailed(self, transport, reason):

        print reason.getErrorMessage( )

        reactor.stop( )



if __name__ == '_ _main_ _':

    import sys

    if not len(sys.argv) == 3:

        print "Usage: %s host port" % _ _file_ _

        sys.exit(1)



    reactor.connectTCP(sys.argv[1], int(sys.argv[2]), StdioProxyFactory( ))

    reactor.run( )

Solution

  • protocol.dataReceived, which you're overriding, is too low-level to serve for the purpose without smart buffering that you're not doing -- per the docs I just quoted,

    Called whenever data is received.

    Use this method to translate to a higher-level message. Usually, some callback will be made upon the receipt of each complete protocol message.

    Parameters

    data
    

    a string of indeterminate length. Please keep in mind that you will probably need to buffer some data, as partial (or multiple) protocol messages may be received! I recommend that unit tests for protocols call through to this method with differing chunk sizes, down to one byte at a time.

    You appear to be completely ignoring this crucial part of the docs.

    You could instead use LineReceiver.lineReceived (inheriting from protocols.basic.LineReceiver, of course) to take advantage of the fact that HTTP requests come in "lines" -- you'll still need to join up headers that are being sent as multiple lines, since as this tutorial says:

    Header lines beginning with space or tab are actually part of the previous header line, folded into multiple lines for easy reading.

    Once you have a nicely formatted/parsed response (consider studying twisted.web's sources so see one way it could be done),

    retrieve Domain name, Sub Domain, Port values from this?

    now the Host header (cfr the RFC section 14.23) is the one containing this info.