Search code examples
mediawikiwikipediawikipedia-apimediawiki-api

mediawiki-api - links on page & getting fields on those pages


If I have a wikimedia category such as "Category:Google_Art_Project_works_by_Vincent_van_Gogh", is there an API to retrieve a list of the URLs linked to on that page?

I've tried this, but it doesn't return any links: https://en.wikipedia.org/w/api.php?action=query&titles=Category:Google_Art_Project_works_by_Vincent_van_Gogh&prop=links

(If not, I'll parse the html and obtain them that way.)

Once I have all the URLs linked to, is there an API to retrieve some of the information on the page? (Summary/Artist, Title, Date, Dimensions, Current location, Licensing)

I've tried this, but it doesn't seem to have a way to return that information: https://en.wikipedia.org/w/api.php?action=query&titles=File:Irises-Vincent_van_Gogh.jpg&prop=imageinfo&iiprop=url


Solution

  • is there an API to retrieve a list of the URLs linked to on that page?

    I guess you're looking for the Categorymembers API which will list the pages in the selected category.

    I've tried this, but it doesn't return any links: https://en.wikipedia.org/w/api.php?action=query&titles=Category:Google_Art_Project_works_by_Vincent_van_Gogh&prop=links

    First, notice that this is a Wikimedia Commons Category, querying the en.wikipedia.org did return a you a missing page. However, even if you query the right project, you will notice that the category description does indeed not contain any links.

    Once I have all the URLs linked to, is there an API to retrieve some of the information on the page?

    You can use the categorymembers query as a generator, then specify the usual properties that you want from each page. However, the metadata you seem to be interested in is not available via the API, you need to parse it out of each image description text.

    Try https://commons.wikimedia.org/w/api.php?action=query&generator=categorymembers&gcmtitle=Category%3aGoogle_Art_Project_works_by_Vincent_van_Gogh&prop=links|imageinfo|revisions&iiprop=timestamp|user|url|size|mime&rvprop=ids|content&rvgeneratexml