Search code examples
pythonamazon-web-servicesamazon-s3botobotocore

botocore >= 1.28.0 slower in multithread application


The official Boto3 docs recommends creating a new resource per thread: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html#multithreading-or-multiprocessing-with-resources

Botocore 1.28.0 merged a feature which appears to generate a list of all possible endpoints on resource creation: https://github.com/boto/botocore/pull/2785

I have a test suite which uses motoserver and an application that relies heavily on parallelized downloads from / uploads to s3 from a process pool. With botocore 1.28.0, the test suite takes an extra 20 minutes to run as compared to the previous version.

I've done some work with cProfile and I can confirm that at least half of the additional time is spent inside of botocore's load_service_model method called during botocore client creation. Haven't tracked down the other ~50% of extra time yet but it's somewhere in botocore usage.

What can I do to speed this up again with the version bump?


Solution

  • Use a single pre-loaded loader instance, e.g.

    from botocore.loaders import Loader
    
    preloader = Loader()
    
    for type_name in frozenset(['endpoint-rule-set-1, paginators-1']):
      preloader.load_service_model(service_name='s3', type_name=type_name)
    
    session_lock = threading.Lock()
    
    def _session():
      session = botocore.session.get_session()
      session.register_component('data_loader', preloader)
      with session_lock:
        return boto3.session.Session(botocore_session=session)
    

    Then in your threads you can use:

    session = _session()
    resource = session.resource(...)