Search code examples
pythonjsonmediawikimediawiki-api

Getting Page Content via Json


Link:http://creepypasta.wikia.com/api.php?%20action=query&prop=revisions&titles=Main_Page&rvprop=content&indexpageids=1&format=jsonfm

From the json file above I want to get the value of "*". I am using python and have the request setup. Normally if I didn't need to grab the page id before I could get the page content I could do this. But seeing as it is not I have run into a bit of trouble and need a bit of help.


Solution

  • That page isn't actually json - it is a representation of the json in html. To request the json, remove the 'fm' at the end of the url.

    In this code, I will load the json into a dictionary using the urllib2 and json packages, and then access the * item.

    url = "http://creepypasta.wikia.com/api.php?%20action=query&prop=revisions&titles=Main_Page&rvprop=content&indexpageids=1&format=json"
    j = json.load(urllib2.urlopen(url))
    value = j['query']['pages']['22491']['revisions'][0]['*']
    

    If you do not know what page number to look at, consider the method found here (replicated below):

    def _finditem(obj, key):
        if key in obj: return obj[key]
        for k, v in obj.items():
            if isinstance(v,dict):
                item = _finditem(v, key)
                if item is not None:
                    return item
    
    _finditem(j,'revisions')[0]['*']