The official Boto3 docs recommends creating a new resource per thread: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html#multithreading-or-multiprocessing-with-resources
Botocore 1.28.0 merged a feature which appears to generate a list of all possible endpoints on resource creation: https://github.com/boto/botocore/pull/2785
I have a test suite which uses motoserver
and an application that relies heavily on parallelized downloads from / uploads to s3 from a process pool. With botocore 1.28.0, the test suite takes an extra 20 minutes to run as compared to the previous version.
I've done some work with cProfile
and I can confirm that at least half of the additional time is spent inside of botocore
's load_service_model
method called during botocore client creation. Haven't tracked down the other ~50% of extra time yet but it's somewhere in botocore usage.
What can I do to speed this up again with the version bump?
Use a single pre-loaded loader instance, e.g.
from botocore.loaders import Loader
preloader = Loader()
for type_name in frozenset(['endpoint-rule-set-1, paginators-1']):
preloader.load_service_model(service_name='s3', type_name=type_name)
session_lock = threading.Lock()
def _session():
session = botocore.session.get_session()
session.register_component('data_loader', preloader)
with session_lock:
return boto3.session.Session(botocore_session=session)
Then in your threads you can use:
session = _session()
resource = session.resource(...)