I was running this section of code in a a basic venv (python 3.9):
if collection.count() == 0:
collection.add(
documents=documents,
metadatas=metadatas,
ids=ids
)
and it took about 11 minutes to execute. For reasons we needed to deploy it on a Conda environment (python 3.10) and the same code on the same machine only took 4 minutes to execute. Is there a specific reason for this? The package I am using is chromaDB and this section of code is creating a vector database.
Machine: Mac Pro M1 pro 32gb
I looked at the version differences and I am aware of faster memory management etc but this doesn't feel like a full explanation of the speed increase.
Generally speaking, Python 3.9 and 3.10 have major differences, since Python 3.10 uses Just-In-Time (JIT)
compiler (if you use the PyPy interpreter) that speedups translating Python bytecode into machine code at runtime. Moreover, Conda environment has its own optimizations, specially in dependency management which affect speedups heavily.
BUT your main approach to find the root of the difference should be using a profiler tool. This can be an example:
import cProfile
profiler = cProfile.Profile()
profiler.enable()
if collection.count() == 0:
collection.add(
documents=documents,
metadatas=metadatas,
ids=ids
)
profiler.disable()
profiler.print_stats()