Search code examples
pythonlatexsanitization

How do I sanitize LaTeX input?


I'd like to take user input (sometimes this will be large paragraphs) and generate a LaTeX document. I'm considering a couple of simple regular expressions that replaces all instances of \ with \textbackslash and all instances of { or } with \} or \{.

I doubt that this is sufficient. What else do I need to do? Note: In case there is a special library made for this, I'm using python.

To clarify, I do not wish anything to be parsed treated as LaTeX syntax: $a$ should be replaced with \$a\$.


Solution

  • If your input is plain text and you are in a normal catcode regime, you must do the following substitutions:

    • \\textbackslash{} (note the empty group!)
    • {\{
    • }\}
    • $\$
    • &\&
    • #\#
    • ^\textasciicircum{} (requires the textcomp package)
    • _\_
    • ~\textasciitilde{}
    • %\%

    In addition, the following substitutions are useful at least when using the OT1 encoding (and harmless in any case):

    • <\textless{}
    • >\textgreater{}
    • |\textbar{}

    And these three disable the curly quotes:

    • "\textquotedbl{}
    • '\textquotesingle{}
    • `\textasciigrave{}