Search code examples
pythonmachine-learningnlpgensimword2vec

How to get the dimensions of a word2vec vector?


I have run a word2vec model on my data list_of_sentence:

from gensim.models import Word2Vec

w2v_model=Word2Vec(list_of_sentence,min_count=5, workers=4)

print(type(w2v_model))

<class 'gensim.models.word2vec.Word2Vec'>

I would like to know the dimensionality of w2v_model vectors. How can I check it?


Solution

  • The vector dimensionality is included as an argument in Word2Vec:

    • In gensim versions up to 3.8.3, the argument was called size (docs)
    • In the latest gensim versions (4.0 onwards), the relevant argument is renamed to vector_size (docs)

    In both cases, the argument has a default value of 100; this means that, if you do not specify it explicitly (as you do here), the dimensionality will be 100.

    Here is a reproducible example using gensim 3.6:

    import gensim
    gensim.__version__
    # 3.6.0
    
    from gensim.test.utils import common_texts
    from gensim.models import Word2Vec
    
    model = Word2Vec(sentences=common_texts, window=5, min_count=1, workers=4) # do not specify size, leave the default 100
    
    wv = model.wv['computer']  # get numpy vector of a word in the corpus
    wv.shape # verify the dimension of a single vector is 100
    # (100,)
    

    If you want to change this dimensionality to, say, 256, you should call Word2Vec with the argument size=256 (for gensim versions up to 3.8.3) or vector_size=256 (for gensim versions 4.0 or later).