Search code examples
pythonmediawikiwikipediawikipedia-apipywikibot

How to find subcategories and subpages on wikipedia using pywikibot?


i would like to get all (87) subcategories and all pages (200) in the "Pages in category "Masculine given names"" section on this site: https://en.wikipedia.org/wiki/Category:Masculine_given_names

I tried it with the following code:

import pywikibot
site = pywikibot.Site("en", "wikipedia")
page = pywikibot.Page(site, 'Category:Masculine_given_names')
print(list(page.categories()))

But with that i only get the categories at the very bottom of the page. How can i get the subcategoreis and (sub)-pages on this site?


Solution

  • How can i get the subcategories and (sub)-pages of a given category?

    First you have to use a Category class instead of a Page class. You have to create it quite similar:

      >>> import pywikibot
      >>> site = pywikibot.Site("en", "wikipedia")
      >>> cat = pywikibot.Category(site, 'Masculine_given_names')
    

    A Category class has additional methods, refer the documentation for further informations and the available parameters. The categoryinfo property for example gives a short overview about the category content:

      >>> cat.categoryinfo
      {'size': 1425, 'pages': 1336, 'files': 0, 'subcats': 89}
    

    There are 1425 entries in this category, there are 1336 pages and 89 subcategories in this case.

    To get all subcategories use subcategories() method:

      >>> gen = cat.subcategories()
    

    Note, this is a generator. As shown below you will get all of them as found in categoryinfo above:

      >>> len(list(gen))
      89
    

    To get all pages (articles) you have to use the articles() method, e.g.

      >>> gen = cat.articles()
    

    Guess how many entries the corresponing list will have.

    Finally there is a method to get all members of the category which includes pages, files and subcategories called members():

      >>> gen = cat.members()