Im using gensim version '3.8.3'
when im running for model Word2Vec and FastText build_vocab
and train
the logs from those functions are missing the values
for example part of the logs of build_vocab
of FastText
08/09/2020 08:19:18 AM [INFO] collecting all words and their counts
08/09/2020 08:19:18 AM [INFO] PROGRESS: at sentence #%i, processed %i words, keeping %i word types
08/09/2020 08:19:18 AM [INFO] PROGRESS: at sentence #%i, processed %i words, keeping %i word types
08/09/2020 08:19:18 AM [INFO] PROGRESS: at sentence #%i, processed %i words, keeping %i word types
the index is missing and printed as i
is there a way to solve it? is it a version bug?
As per the discussion on the gensim
project issue you opened for the same problem, this appears to be some problem with your Python installation's logging functionality that is unrelated to gensim
or the word2vec algorithm. And in some respects, the problem is more foundational & concerning, as it indicates some replacement of core functionality with a sloppy alternative.
For example, if you see a similar problem with the test code...
import logging
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(filename)s:%(lineno)s - %(message)s')
logging.info(
"TEST A %i B %.2f C %.0f D %i F %i",
1, 2, 3, 4, 5
)
...then the problem is in the core logging
module.
I would suggest starting from a fresh development environment – at the very least, a fresh separate Python environment (using either the core venv
functionality or an environment-manager like conda
), and if practical even a fresh machine/OS install.
If the problem with the above simple test code goes away in a fresh environment, then you can incrementally reproduce the original environment by adding libraries/tools, checking for working logging after each major step, and if the problem recurs you'll have a better idea of which step introduced it.