Search code examples
web-crawlerwikipediawikimedia

Get all page ids linked to a given Wikipedia page


I am trying to use the Wikimedia public APIs for accessing the English Wikipedia database.

I would like to have a way to obtain all the page ids linked to a given page.

If I do like this: http://en.wikipedia.org/w/api.php?action=query&titles=computer&format=xml

I am only able to obtain the page id of the 'computer' page.

I know I could parse for the 'href' tags inside that page and make n queries, but it is not very efficient.

Can I achieve this through APIs alone?


Solution

  • It looks like you're looking for the backlinks module.

    With that, you can do something like:

    http://en.wikipedia.org/w/api.php?action=query&bltitle=computer&list=backlinks&format=xml

    Also, the API uses paging, so you'll most likely need to add &bllimit=max to the query and then make follow-up requests to get the remaining pages.