Search code examples
python-3.xwikipediatext-extractiondata-extraction

How to solve Wikipedia API Page Error while reading in python?


I am working on a document summarizer NLP project, so I wanted to extract Elon Musk's Bio from Wikipedia. I tried to extract it with the help of the Wikipedia library (API),

I first tried with page title (i,e, Elon Musk)but it's giving me a page error PageError: Page id "e on musk" does not match any pages. Try another id! Did you noticed the page id it's showing "e on musk" then I tried with its page id number (i.e Q317521) which outputs me results about some plant 'Matthiola incana'

ELon Musk Wikipedia page

Here is my code

import wikipedia

elon = wikipedia.page('Elon Musk').content
elon
# outputs
PageError: Page id "e on musk" does not match any pages. Try another id!


elon = wikipedia.page('Q317521').content
elon
# outputs (shorted)
Matthiola incana is a species of flowering plant in the cabbage family Brassicaceae. Common names include Brompton stock,

I tried with Alan turning which is not working, also tried with Albert_Einstein which is showing weird output just like Elon Musk.

However, it worked with Nikola Tesla, Michio Kaku, Narendra Modi etc, which shows that I am not doing it wrong.


Solution

  • wikipedia.page is kind of crap. It uses Wikipedia's search suggestion API to transform its title parameter before looking it up on Wikipedia. Search suggestions (something like Google's "did you mean...?" feature) are completely unfit for this purpose, they are a last-ditch effort for changing a zero-result search into one that yields results, by looking for the closest (in terms of edit distance) string made up of terms from a dictionary of commonly used words. This works well for fixing typos, and is absolutely not meant to be used for search terms which do yield results, much less for actual article titles.

    You can disable this behavior with auto_suggest=false, although given that half the bug reports for wikipedia are about this issue, some going back almost a decode, you might want to look for a better maintained library.