Search code examples
nlpstanford-nlp

gender identification in natural language processing


I have written below code using stanford nlp packages.

GenderAnnotator myGenderAnnotation = new GenderAnnotator();
myGenderAnnotation.annotate(annotation);

But for the sentence "Annie goes to school", it is not able to identify the gender of Annie.

The output of application is:

     [Text=Annie CharacterOffsetBegin=0 CharacterOffsetEnd=5 PartOfSpeech=NNP Lemma=Annie NamedEntityTag=PERSON] 
     [Text=goes CharacterOffsetBegin=6 CharacterOffsetEnd=10 PartOfSpeech=VBZ Lemma=go NamedEntityTag=O] 
     [Text=to CharacterOffsetBegin=11 CharacterOffsetEnd=13 PartOfSpeech=TO Lemma=to NamedEntityTag=O] 
     [Text=school CharacterOffsetBegin=14 CharacterOffsetEnd=20 PartOfSpeech=NN Lemma=school NamedEntityTag=O] 
     [Text=. CharacterOffsetBegin=20 CharacterOffsetEnd=21 PartOfSpeech=. Lemma=. NamedEntityTag=O]

What is the correct approach to get the gender?


Solution

  • If your named entity recognizer outputs PERSON for a token, you might use (or build if you don't have one) a gender classifier based on first names. As an example, see the Gender Identification section from the NLTK library tutorial pages. They use the following features:

    • Last letter of name.
    • First letter of name.
    • Length of name (number of characters).
    • Character unigram presence (boolean whether a character is in the name).

    Though, I have a hunch that using character n-gram frequency---possibly up to character trigrams---will give you pretty good results.