Search code examples
javajsonparsingxwiki

Wikipedia content parsing JSON


I want to get the contents of a Wikipedia page and then do some funny stuff with it.

The idea is that I want to get them in XML/JSON format and at the moment I don't seem to find a way to do it.

For the moment I succeeded in getting this far:

https://en.wikipedia.org/w/api.php?action=query&format=jsonfm&prop=revisions&titles=April_1&rvprop=content&rvcontentformat=text%2Fx-wiki

Bu I receive the content in XWiki and I cannot change it to JSON due to the fact that the page does not support it.

How can I parse the XWiki to a JSON or how can I get the contents of the page.

Thanks!


Solution

  • Yes, you can use the HTML parser inside of XWiki Rendering to parse the HTML generated by wikipedia. This gives you an AST on which you can do whatever you wish.

    See http://rendering.xwiki.org/xwiki/bin/view/Main/WebHome for more details.

    You just need to find a way to get the wikipedia content in HTML.