Search code examples
pythongoogle-colaboratoryword2vec

IOPub data rate exceeded. in Google Colab


I'm trying to print words from a DATASET with length 8483448 bytes on google colab but i'm geting this error :

words =list(model.wv.vocab)
print('this vocabulary for corpus')
print(words)

ERROR:

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

thanks for giving me some help to fix this error.


Solution

  • Given the error, you seem to be hitting a Google Colab-specific limit on the output size.

    Try printing len(model.wv.vocab) first to get a sense of how large of an output you're trying to display. It may not be practical to show in a notebook cell!

    If you just need a peek at some of the large vocabulary, print a small subet, for example print(words[0:10]).

    Note also: in the latest Gensim versions (>=4.0.0), the .vocab dictionary goes away. But, a list of all known tokens (words), usually in descending-frequency order, is available in list model.wv.index_to_key. (So, in gensim-4.0.0 & up, you could look at the 100 most-frequent tokens with print(model.wv.index_to_key[0:100]).)