Search code examples
mediawikimediawiki-api

Is there a way to fetch Wikipedia/Wiktionary HTML only for the main article body?


A decade or more ago I used to do quite a bit of hacking on Wiktionary and MediaWiki. I seem to remember that I used to use some method to get the HTML of a page without most of the interface like menus, sidebars, header, footer etc.

I thought it was just done with special HTML parameters, much like the raw and printable, but I can't seem to find it in the URL help page.

It could be that my memory isn't accurate and maybe I used the old API.

This was a boon for doing things with text corpuses such as playing with Markov chains, and also for trying to parse the Wiktionary format, etc. I used to grab this version and convert to plain text or use an HTML parser etc.

One thing I seem to remember that might help is that it didn't remove all of the unwanted stuff. For instance I think it retained the Table of Contents. But it got rid of the vast majority of it.

Does anyone know if there's a URL parameter for this, or know what I might've been doing back in the day? If not, a method using the old API or the new REST API would be of interest.

What did I try?
I used Google, I searched here on StackOverflow, I racked my brain, and I hunted for the URL parameter documentation.

What did I expect?
I expected to remember what I used to do, or to find the old way I used to do it documented, or a new way that achieved the same end.


Solution

  • Instead of raw, try render. (The relevant documentation is Parameters to index.php.) Although for serious use I'd recommend using the API instead.