Search code examples
nlp

Why is part-of-speech tag for Adjectives 'JJ'?


What is the etymology for JJ tag denoting POS for adjectives? I am unable to find any references online. There are several resources listing all the tags, but none describing the reason.


Solution

  • It may be impossible to get an official answer. JJ has been used since the Brown corpus, and appears without comment in publications going back to at least 1981 (just after publication of the 1979 Form C "revised and amplified" edition).

    Per this record of the corpus, the main publication by the authors accompanying Form C is the manual, available here. It contains the list, with plenty of explanations of how words are classified and none for how the tags were made.

    After reviewing Role of the Brown Corpus in the History of Corpus Linguistics (Olga Kholkovskaia, 2017), I agree that the authors generally focused on the massive compilation and tagging method over commentary. The 1967 classic "Computational analysis of present-day American English" is mostly frequency tables, with no instance of "adjective" or JJ in it. Thus, I found no publications where lead authors Wilson and Kucera discusss their choice of tags, and both passed away in the 2000s.

    This limits us to speculation. The authors had 82 tags that needed to be short, memorable (the tagging process was partly manual), and allow various modifiers to be appended without creating confusion. Vowels are fairly useless for this, with every part of speech in the table containing at least one. Verb (VB) and noun (NN) go by first-and-last letters, while others may use initialisms (coordinating conjunction CC, foreign word FW), syllable initialisms (modal MD, predeterminer PDT), first letters (possessive POS), arbitrary associations (interjections UH).

    Adjective's JJ is odd in using a letter absent from the phrase and does not make intuitive sense like UH, possessive P$, or plural S - but hardly the strangest tag choice, even in the reduced Penn Treebank table. Perhaps someone wanted to match NN's style, and doubled the first relatively uncommon letter in adjective. Any more detailed answer may only be possible by finding unpublished notes or still-living colleagues.