I'm trying to change wikitext into normal text using Python regular expressions substitution. There are two formatting rules regarding wiki link.
[[Name of page | Text to display]]
(http://en.wikipedia.org/wiki/Wikipedia:Cheatsheet)
Here is some text that gives me a headache.
The CD is composed almost entirely of [[cover version]]s of [[The Beatles]] songs which George Martin [[record producer|produced]] originally.
The text above should be changed into:
The CD is composed almost entirely of cover versions of The Beatles songs which George Martin produced originally.
The conflict between [[ ]] and [[ | ]] grammar is my main problem. I don't need one complex regular expression. Applying multiple (maybe two) regular expression substitution(s) in sequence is ok.
Please enlighten me on this problem.
wikilink_rx = re.compile(r'\[\[(?:[^|\]]*\|)?([^\]]+)\]\]')
return wikilink_rx.sub(r'\1', the_string)
Example: http://ideone.com/7oxuz
Note: you may also find some MediaWiki parsers in http://www.mediawiki.org/wiki/Alternative_parsers.