I'm just trying to mark subparts of a document as spans as per Spacy's documentation
import spacy
nlp = spacy.load('en_core_web_sm')
sentence = "The car with the white wheels was being confiscated by the police when the owner returns from robbing a bank"
doc = nlp(sentence)
doc.spans['remove_parts'] = [doc[2:6], doc[9:12]]
doc.spans['remove_parts']
This looks pretty straight forward, but Spacy returns the following error (and attributes it to the second line i.e. the assignment):
AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'spans'
I can't see what's going on at all. Is this a Spacy bug? Has spans
property been removed even though it is still in the documentation? If not what am I missing?
PD: I'm using Colab for this. And spacy.info
shows:
spaCy version 2.2.4
Location /usr/local/lib/python3.7/dist-packages/spacy
Platform Linux-4.19.112+-x86_64-with-Ubuntu-18.04-bionic
Python version 3.7.10
Models en
This code:
nlp = English()
text = "The car with the white wheels was being confiscated by the police when the owner returns from robbing a bank"
doc = nlp(text)
doc.spans['remove_parts'] = [doc[2:6], doc[9:12]]
doc.spans['remove_parts']
should work correctly from spaCy v3.0 onwards. If it doesn't - can you verify that you are in fact running the code from the correct virtual environment within colab (and not a different environment using spaCy v2)? We have previously seen issues where Colab would still be accessing older installations of spaCy on the system, instead of sourcing the code from the correct venv. To double check, you can try running the code in a Python console directly instead of through Colab.