I have the following code:
from spacy.lang.en import English
nlp = English()
# Process the text
doc = nlp("I like tree kangaroos and narwhals.")
# Select the first token
first_token = doc[0]
# Print the first token's text
print(first_token**.text**)
The problem with .text in the end of the code is that even if I omit it everything works fine. I have seen the .text method many times in spacy coding but I don't understand what is it doing. My question is simple what is this .text method doing?
Note that doc[0]
is a Token, not a string.
Using .text
is returning the string that your Token
object holds. The Token
can have plenty of other attributes, too.
When Token
objects are printed, the representation is just the text!—see the source code. That's why they look the same when you print first_token
and first_token.text
.
Power user stuff; skip if you want: If you want to see why the behavior is different between Token
and string objects, try concatenating two Token
s with +
, or comparing them for equality. They don't have __eq__
implemented, so the comparison is just based on the Token
's address in memory.