I'm trying to learn Gensim using its site. There is a function named 'remove_stopword_tokens' which is useful for my research. Now, although the module is defined and is present on their website (exact link: link),I can't import it on my colab
Note: This is my code:
import gensim
from gensim.parsing.preprocessing import remove_stopword_tokens
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-2-dbd838c83237> in <module>
----> 1 from gensim.parsing.preprocessing import remove_stopword_tokens
ImportError: cannot import name 'remove_stopword_tokens' from 'gensim.parsing.preprocessing' (/usr/local/lib/python3.7/dist-packages/gensim/parsing/preprocessing.py)
---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.
To view examples of installing some common dependencies, click the
"Open Examples" button below.
updated & corrected answer
You've run into a limitation of Google Colab - it may not have the most-recent version of libraries.
You can see this by checking what the value of gensim.__version__
is. In my check of Google Colab right now (September 2022), it reports 3.6.0
– a version of Gensim that's about 4 years old, and lacks later fixes & addtions. The remove_stopwords_tokens()
function was only added recently.
Fortunately, you can update the gensim
package backing the Colab notebook yourself, using a shell-escape to run pip
. Inside a Colab cell, run:
!pip install gensim -U
If you'd already done an import gensim
, it will warn you that you must restart the runtime for the new code to be found.
Note that for clarity reasons you might choose to prefer using more-specific imports, as many project style guides suggest, rather than doing any broad top-level import gensim
at all. Just mport the individual classes and/or functions you need, specifically & explicitly. That is, just:
from gensim.parsing.preprocessing import remove_stopword_tokens
# ... other exact class/function/variable imports you'll use...
remove_stopword_tokens(sentence)
On the other hand, if you want things simple-but-sloppy (not recommended), once you import gensim
, it has already (via its own custom initialization routines) imported all of its submodules for you. So you could do:
import gensim # parsing & all gensim's other submodules now referenceable!
gensim.parsing.remove_stopword_tokens(sentence)
(Pro Python programmer style tends not to do this latter approach, of prefixing all in-the-actual-code calls with long dot-paths.)