Search code examples
pythonasynchronoustornado

Using tornado ioloop for loading big python pickle file into memory


I am building a testing server that loads up a huge pickle file (took about 30s) when an endpoint is hit. My goal is to update it to load the pickle as a python object into memory in the background when the tornado web server boots up as a separate thread. So when the endpoint is hit, it either finds it in the memory or it waits until the thread has completed the loading. That way will make the boot-up much faster.

I am here seeking some recommendation on what's the best way to add async to make this operation working.

my_server.py

    import tornado.ioloop
    import tornado.web

    from my_class import MyClass

    class MainHandler(tornado.web.RequestHandler):
        def get(self):
            m = MyClass.get_foobar_object_by_name('foobar')
            self.write("Hello, world")

    def make_app():
        return tornado.web.Application([
            (r"/", MainHandler),
        ])

    if __name__ == "__main__":
        app = make_app()
        app.listen(8888)
        MyClass.load()  # takes 30s to load
        tornado.ioloop.IOLoop.current().start()

my_class.py

    class MyClass(object):
        pickle_path = '/opt/some/path/big_file.pickle'
        foobar_map = None

        @staticmethod
        def load():
            # this step takes about 30s to load
            MyClass.foobar_map = pickle.load(open(local_path, 'rb'))

        @staticmethod
        def get_foobar_object_by_name(foobar_name):
            if MyClass.foobar_map is None:
                MyClass.load()
            return MyClass.foobar_map.get(foobar_name)

Solution

  • The pickle module has a synchronous interface, so the only way to run it asynchronously is to run it on another thread. Using the new IOLoop.run_in_executor interface in Tornado 5.0:

    from tornado.ioloop import IOLoop
    from tornado.web import RequestHandler
    from tornado.locks import Lock
    
    class MyClass:
        lock = Lock()
    
        @staticmethod
        async def load():
            async with MyClass.lock():
                # Check again inside the lock to make sure we only do this once. 
                if MyClass.foobar_map is None:
                    MyClass.foobar_map = await IOLoop.current().run_in_executor(None, pickle.load, open(local_path, 'rb'))
    
        @staticmethod
        async def get_foobar_object_by_name(foobar_name):
            if MyClass.foobar_map is None:
                await MyClass.load()
            return MyClass.foobar_map.get(foobar_name)
    
    class MainHandler(RequestHandler):
        async def get(self):
            m = await MyClass.get_foobar_object_by_name('foobar')
            self.write("Hello, world")
    

    Note that async is contagious: anything that calls an async function also needs to be async and use await.