Search code examples
pythonapache-storm

Open file in an Apache Storm Spout with python


I am trying to make Apache Storm Spout read from a file line by line. I have tried to write these statements, but they didn't work . It gave me the first line only iterated every time:

class SimSpout(storm.Spout):
    # Not much to do here for such a basic spout
    def initialize(self, conf, context):
        ## Open the file with read only permit
        self.f = open('data.txt', 'r')
        ## Read the first line 
        self._conf = conf
        self._context = context
        storm.logInfo("Spout instance starting...")

    # Process the next tuple
    def nextTuple(self):
        # check if it reach at the EOF to close it 
        for line in self.f.readlines():
            # Emit a random sentence
            storm.logInfo("Emiting %s" % line)
            storm.emit([line])

# Start the spout when it's invoked
SimSpout().run()

Solution

  • Disclaimer: Since I have no way to test this, this answer will simply be from inspection.

    You failed to save the filehandle you opened in initialize(). This edit saves the filehandle and then use the saved filehandle for the read. It also fixes (I hope) some indenting that looked wrong.

    class SimSpout(storm.Spout):
        # Not much to do here for such a basic spout
        def initialize(self, conf, context):
            ## Open the file with read only permit
            self.f = open('mydata.txt', 'r')
            self._conf = conf
            self._context = context
    
            storm.logInfo("Spout instance starting...")
    
        # Process the next tuple
        def nextTuple(self):
            # check if it reach at the EOF to close it
            for line in self.f.readlines():
                # Emit a random sentence
                storm.logInfo("Emiting %s" % line)
                storm.emit([line])
    
    # Start the spout when it's invoked
    SimSpout().run()