Search code examples
pythonparsingwikitextcreole

How does one parse simple inline markup (i.e. *bold*), in Python?


How does one implement a parser (in Python) for a subset of wikitext that modifies text, namely:

*bold*, /italics/, _underline_ 

I'm converting it to LaTeX, so the conversion is from:

Hello, *world*! Let's /go/.

to:

Hello \textbf{world}! Let's \textit{go}.

Though there's nothing specific about it being a conversion to LaTeX (notably except nested cases like "*bold /italics* whatami/" => "textbf{bold \textit{italics} whatami}").

I've looked at existing markup libraries, but they're (a) not quite the wiki language I'd like, and (b) seemingly overpowered for this problem.

I've considered reverse engineering Creoleparser, but I'd like to know what suggestions others have before I undertake that effort.

Thanks!


Solution

  • If your language is small, regular expressions might be the least painful solution:

    >>> import re
    >>> str = "Hello, *world*! Let's /go/."
    >>> str = re.sub(r"\*([^\*]*)\*", r"\textbf{\1}", str)
    >>> str = re.sub(r"/([^/]*)/",   r"\textit{\1}", str)
    >>> str
    "Hello, \textbf{world}! Let's \textit{go}."