Search code examples
pythonwords

Creating a list of words from Wikipedia


I am creating a game and I need a dictionary (a list of plain words in this case) containing not only the base form, but all the others as well. In this case the language is Italian and, for example, the verbs have many forms and nouns too.

Since the language is very irregular, I want to get the words from a huge source which may contain them all. At first I thought about Wikipedia: I would download every article, extract the text, and filter the words.

This will take so much time that I'd like to know whether there could be better solutions, both in terms of time and completeness of the list.


Solution

  • If you're on a Linux system you might want to look in /usr/share/dict/words.