I wants to optimize gensim to run doc2vec in Window7
[1] C compiler
I installed gensim by following this instruction: https://radimrehurek.com/gensim/install.html
pip install --upgrade gensim
However, in this page(https://radimrehurek.com/gensim/models/doc2vec.html), it is saying that C compiler is needed before installing gensim.
Make sure you have a C compiler before installing gensim, to use optimized (compiled) doc2vec training (70x speedup [blog]).
[2] BLAS
In the tutorial, https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb it is saying that
Time to Train
If the BLAS library is being used, this should take no more than 3 seconds. If the BLAS library is not being used, this should take no more than 2 minutes, so use BLAS if you value your time.
So it seems like I have to install BLAS for optimization, but I have no idea what BLAS is and there are little and complex BLAS installation guides for window.
It's not just BLAS that gensim's optimized code needs, but native-compiled libraries based on Cython code.
If at all possible, this kind of work should be done on UNIX-like systems (Linux/MacOS), because that's where most of the open-source libraries are most developed, tested, and used. So you'll be closer to the system configurations of the primary developers, and larger user community – meaning default installation instructions are more likely to "just work", and any problems you run into are more likely to have existing answers in findable places.
But if you're trapped using Windows, the 'conda' distribution of Python generally does a good job of installing optimized versions of the key libraries on Windows, so it can be a good choice. I especially like to start with the 'miniconda' variant, so that only the exact packages I explicitly need are installed into an environment.
The Miniconda installation instructions and getting-started-guide are both quite good. Generally once you are in a conda
environment you can conda install PACKAGENAME
for major foundational packages like numpy
or scipy
, and still choose to pip install PACKAGENAME
for anything that's not in the conda repositories, or not as up-to-date in the conda repositories. (Sometimes it makes sense to get gensim
from pip
even if otherwise using a conda
-based environment.)