Search code examples
pythonwikipedia-api

Wikipedia API for Python: how can I get the item ID from a corresponding page?


I am using Wikipedia-API 0.5.4, and I would like to retrieve the item ID for the item being discussed on a given page. Is it possible to do this using the data returned from a page query?

I am able to retrieve the pageid. However, pages in different languages about the same item do not have the same pageid, but they do refer to a single item a unique item ID.

In the example below, the pageid for the English language page on the singer Cher is different from the pageid for the corresponding French language page, while the item ID for "Cher" should be the same in both cases.

Is the item ID not accessible from the page object?

import wikipediaapi as wp
wp_en = wp.Wikipedia('en')
cher_en = wp_en.page('Cher')

print(cher_en.pageid)
> 80696

print(cher_en.langlinks['fr'].pageid)
> 339022

Solution

  • I ended up using the requests library to use the Wikipedia REST API directly. Including prop=pageprops will return the item ID, which is shared across different languages.

    import requests as rq
    
    request_str = 'https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&titles=Cher&format=json'
    resp = rq.get(request_str)
    resp.text.split('wikibase_item":"')[1].split('"')[0]
    > 'Q12003'
    
    fr_str = 'https://fr.wikipedia.org/w/api.php?action=query&prop=pageprops&titles=Cher_(artiste)&format=json'
    fr_resp = rq.get(request_str)
    fr_resp.text.split('wikibase_item":"')[1].split('"')[0]
    > 'Q12003'