I'm trying to use pre-trained word2vec in Google Colab. Previously I downloaded the model onto my C:/, and then uploaded it to my Google Drive. However, I get this error I can't seem to find anywhere.
My code is:
from gensim.models import word2vec
import urllib.request
urllib.request.urlretrieve("https://drive.google.com/file/d/1lgCddPxJC__QA-qGtYTdNNoHRiYWyOpQ/view?usp=sharing/GoogleNews-vectors-negative300.bin", "GoogleNews-vectors-negative300.bin")
word2vec_path = 'GoogleNews-vectors-negative300.bin'
word2vec = gensim.models.KeyedVectors.load_word2vec_format(word2vec_path, binary=True)
Error Message:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-354-492ef9dcbbcc> in <module>()
1 word2vec_path = 'GoogleNews-vectors-negative300.bin'
----> 2 word2vec = gensim.models.KeyedVectors.load_word2vec_format(word2vec_path, binary=True)
2 frames
/usr/local/lib/python3.7/dist-packages/gensim/models/utils_any2vec.py in <genexpr>(.0)
171 with utils.smart_open(fname) as fin:
172 header = utils.to_unicode(fin.readline(), encoding=encoding)
--> 173 vocab_size, vector_size = (int(x) for x in header.split()) # throws for invalid file format
174 if limit:
175 vocab_size = min(vocab_size, limit)
ValueError: invalid literal for int() with base 10: '<!DOCTYPE'
As use ~deceze notes, that error hints that the file has some typical HTML boilerplate (<~DOCTYPE
) where the code is expecting 2 int
s declaring the forthcoming count-of-vectors (vocab_size
) & their dimensionality (vector_size
).
It's likely your urlrequest()
action didn't receive the file you expected, and perhaps got a 'file not found' or other error instead. So: