Search code examples
pythongoogle-colaboratoryword-embedding

How to use GloVe word-embeddings file on Google colaboratory


I have downloaded the data with wget

!wget http://nlp.stanford.edu/data/glove.6B.zip
 - ‘glove.6B.zip’ saved [862182613/862182613]

It is saved as zip and I would like to use glove.6B.300d.txt file from the zip file. What I want to achieve is :

embeddings_index = {}
with io.open('glove.6B.300d.txt', encoding='utf8') as f:
    for line in f:
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:],dtype='float32')
        embeddings_index[word] = coefs

Of course I am having this error:

IOErrorTraceback (most recent call last)
<ipython-input-47-d07cafc85c1c> in <module>()
      1 embeddings_index = {}
----> 2 with io.open('glove.6B.300d.txt', encoding='utf8') as f:
      3     for line in f:
      4         values = line.split()
      5         word = values[0]

IOError: [Errno 2] No such file or directory: 'glove.6B.300d.txt'

How can I unzip and use that file in my code above on Google colab?


Solution

  • Its simple, checkout this older post from SO.

    import zipfile
    zip_ref = zipfile.ZipFile(path_to_zip_file, 'r')
    zip_ref.extractall(directory_to_extract_to)
    zip_ref.close()