Search code examples
pythonjsonnlpnltkspacy

what is the meaning of heads in spacy training data?


I'm trying to train a model on my own data and I'm using Spacy library.

But I'm confused about "# index of token head" in a code example.

what exactly heads mean here?

# training data: texts, heads and dependency labels
# for no relation, we simply chose an arbitrary dependency label, e.g. '-'
TRAIN_DATA = [
    (
        "find a cafe with great wifi",
        {
            "heads": [0, 2, 0, 5, 5, 2],  # index of token head
            "deps": ["ROOT", "-", "PLACE", "-", "QUALITY", "ATTRIBUTE"],
        },
    )

Solution

  • In your example, the task is to reconstruct a tree of syntactic dependencies. This tree shows for each word the corresponding "head" word to which it is attached and the type of attachment. One particular format in which such trees are described is called CoNLL-U.

    In your example, e.g. "great" (word 4, if we count from 0) is attached to "wifi" (word 5), and "great" is a quality of "wifi". Therefore, the 4'th entry of heads equals 5, and the 4'th entry of deps equals "QUALITY".