Search code examples
nlpspacy

I don't understand what is the purpose of .text in spacy code


I have the following code:

    from spacy.lang.en import English

    nlp = English()

    # Process the text
    doc = nlp("I like tree kangaroos and narwhals.")

    # Select the first token
    first_token = doc[0]

    # Print the first token's text
    print(first_token**.text**)

The problem with .text in the end of the code is that even if I omit it everything works fine. I have seen the .text method many times in spacy coding but I don't understand what is it doing. My question is simple what is this .text method doing?


Solution

  • Note that doc[0] is a Token, not a string.

    Using .text is returning the string that your Token object holds. The Token can have plenty of other attributes, too.

    When Token objects are printed, the representation is just the text!—see the source code. That's why they look the same when you print first_token and first_token.text.

    Power user stuff; skip if you want: If you want to see why the behavior is different between Token and string objects, try concatenating two Tokens with +, or comparing them for equality. They don't have __eq__ implemented, so the comparison is just based on the Token's address in memory.