Search code examples
phpmediawikimediawiki-api

Retrieving the actual displayed link value from MediaWiki API


MediaWiki API currently allows you to retrieve all links for a specific article. However, the displayed name of the link, as it appears on the wiki page, is different because of the wiki piped link format.

MediaWiki allows you to retrieve the actual underlying link, but I haven't been able to find a way to also retrieve the displayed value that the user actually sees.

e.g:

In the following case :

"[[Second Polish Republic|Poland]] was invaded by [[Nazi Germany|Germany]] during World War II."

I wish to retrieve not only the actual links (Second Polish Republic, Nazi Germany) but also their corresponding displayed value (Second Polish Republic => Poland, Nazi Germany =>Germany).

Is there a way to do this?

Here is an example of the format I have been using for my request:

http://en.wikipedia.org/w/api.php?&action=query&redirects=&indexpageids=&prop=links&format=json&titles=World_War_II&pllimit=500


Solution

  • There is no documented way to do this.

    Your best shot is probably to parse the wikitext, looking for [[Link target|Link text]]:

    1. Get the full wiki code, using action=parse
    2. Use action=expandtemplates to make sure you get any links inside a template
    3. Use a regex to get all links and their link text