Search code examples
pythonregexlatexlilypond

REGEX parsing commands from latex lines - Python


I'm trying to parse and remove any \command (\textit, etc...) from each line loaded (from .tex file or other commands from lilypond files as [\clef, \key, \time]).

How could I do that?

What I've tried

import re
f = open('example.tex')
lines = f.readlines()
f.close()

pattern = '^\\*([a-z]|[0-9])' # this is the wrong regex!!
clean = []
for line in lines:
    remove = re.match(pattern, line)
    if remove:
        clean.append(remove.group())

print(clean)

Example

Input

#!/usr/bin/latex

\item More things
\subitem Anything

Expected output

More things
Anything

Solution

  • You could use a simple regex substitution using this pattern ^\\[^\s]*:

    Sample code in python:

    import re
    p = re.compile(r"^\\[^\s]*", re.MULTILINE)
    
    str = '''
    \item More things
    \subitem Anything
    '''
    
    subst = ""
    
    print re.sub(p, subst, str)
    

    The result would be:

    More things
    Anything