Search code examples
pythonsessionget

How to use mwapi library to get a wikipedia page?


I have been trying to figure out the documentation of the mwapi library (MediaWiki API) and I cannot figure out how to simply request a page based on a search query or keyword. I know I should use get() but filling in the parameters with keywords yield errors. Does anyone know how this works to look up something like "Earth Wind and Fire"?

Documentation can be found here: http://pythonhosted.org/mwapi

and here is the only example they have of get() being used

import mwapi

session = mwapi.Session('https://en.wikipedia.org')

print(session.get(action='query', meta='userinfo'))

{'query': {'userinfo': {'anon': '', 'name': '75.72.203.28', 'id': 0}}, 'batchcomplete': ''}

print(session.get(action='query', prop='revisions', revids=32423425))

{'query': {'pages': {'1429626': {'ns': 0, 'revisions': [{'user': 'Wknight94', 'parentid': 32276615, 'comment': '/* References */ Removing less-specific cat', 'revid': 32423425, 'timestamp': '2005-12-23T00:07:17Z'}], 'title': 'Grigol Ordzhonikidze', 'pageid': 1429626}}}, 'batchcomplete': ''}


Solution

  • Maybe this code will help you understand the API:

    import json  # Used only to pretty-print dictionaries.
    import mwapi
    
    USER_AGENT = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.6) Gecko/2009011913  Firefox'
    
    session = mwapi.Session('https://en.wikipedia.org', user_agent=USER_AGENT)
    
    query = session.get(action='query', titles='Earth Wind and Fire')
    print('query returned:')
    print(json.dumps(query, indent=4))
    
    pages = query['query']['pages']
    if pages:
        print('\npages:')
        for pageid in pages:
            data = session.get(action='parse', pageid=pageid, prop='text')
            print(json.dumps(data, indent=4))
    

    Output:

    query returned:
    {
        "batchcomplete": "",
        "query": {
            "pages": {
                "313370": {
                    "pageid": 313370,
                    "ns": 0,
                    "title": "Earth Wind and Fire"
                }
            }
        }
    }
    
    pages:
    {
        "parse": {
            "title": "Earth Wind and Fire",
            "pageid": 313370,
            "text": {
                "*": "<div class=\"redirectMsg\"><p>Redirect to:</p><ul class=\"redirectText\"><li><a href=\"/wiki/Earth,_Wind_%26_Fire\" title=\"Earth, Wind &amp; Fire\">Earth, Wind &amp; Fire</a></li></ul></div><div class=\"mw-parser-output\">\n\n<!-- \nNewPP limit report\nParsed by mw1279\nCached time: 20171121014700\nCache expiry: 1900800\nDynamic content: false\nCPU time usage: 0.000 seconds\nReal time usage: 0.001 seconds\nPreprocessor visited node count: 0/1000000\nPreprocessor generated node count: 0/1500000\nPost\u2010expand include size: 0/2097152 bytes\nTemplate argument size: 0/2097152 bytes\nHighest expansion depth: 0/40\nExpensive parser function count: 0/500\n-->\n<!--\nTransclusion expansion time report (%,ms,calls,template)\n100.00%    0.000      1 -total\n-->\n</div>\n<!-- Saved in parser cache with key enwiki:pcache:idhash:313370-0!canonical and timestamp 20171121014700 and revision id 16182229\n -->\n"
            }
        }
    }