Search code examples
nlpstanford-nlp

English dictionary in readable format (text or xml)


I am hoping to find a downloadable (free or paid) English dictionary preferably from Oxford, Cambridge, Webster in text or XML format to do some NLP.

I hope that each entry has

  • a full part of speech,
  • pronunciation,
  • morphology in case of verb and noun
  • multiple sense/definition entries

such as in the following page http://www.merriam-webster.com/dictionary/side.

The actual text of the definition is not important. What I need most is the part of speech, pronunciation, morphology, order of definition entries.

Also wondering: what does the Stanford NLP toolkit use as lexical resources when it does POS tagging?

Thank you.


Solution

  • Here and here are the similar questions. In summary:

    1. Part-of speech dictionary - unfortunately, with quite narrow tag set.
    2. Pronouncing Dictionary
    3. Multiple senses - WordNet

    Morphological dictionary can be found in FreeLing distribution - see data/en/dicc.src. Btw, there are also senses and phonetic dictionaries.

    About Stanford POS tagger: they use Penn treebank, proof