Search code examples
python-3.xgensimword2vec

word2vec logging missing values


Im using gensim version '3.8.3'

when im running for model Word2Vec and FastText build_vocab and train
the logs from those functions are missing the values

for example part of the logs of build_vocab of FastText

08/09/2020 08:19:18 AM [INFO] collecting all words and their counts
08/09/2020 08:19:18 AM [INFO] PROGRESS: at sentence #%i, processed %i words, keeping %i word types
08/09/2020 08:19:18 AM [INFO] PROGRESS: at sentence #%i, processed %i words, keeping %i word types
08/09/2020 08:19:18 AM [INFO] PROGRESS: at sentence #%i, processed %i words, keeping %i word types

the index is missing and printed as i

is there a way to solve it? is it a version bug?


Solution

  • As per the discussion on the gensim project issue you opened for the same problem, this appears to be some problem with your Python installation's logging functionality that is unrelated to gensim or the word2vec algorithm. And in some respects, the problem is more foundational & concerning, as it indicates some replacement of core functionality with a sloppy alternative.

    For example, if you see a similar problem with the test code...

    import logging
    logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(filename)s:%(lineno)s - %(message)s')
    
    logging.info(
        "TEST A %i B %.2f C %.0f D %i F %i",
        1, 2, 3, 4, 5
    )
    

    ...then the problem is in the core logging module.

    I would suggest starting from a fresh development environment – at the very least, a fresh separate Python environment (using either the core venv functionality or an environment-manager like conda), and if practical even a fresh machine/OS install.

    If the problem with the above simple test code goes away in a fresh environment, then you can incrementally reproduce the original environment by adding libraries/tools, checking for working logging after each major step, and if the problem recurs you'll have a better idea of which step introduced it.