Search code examples
pythonregex

Replace spaces between specific characters only using regex


I am trying to replace whitespaces, in latex that is contained in a markdown document, with \\; using regex.
In the md package I'm using, all latex is wrapped in either $ or $$

I would like to change the following from

"dont edit this $result= \frac{1}{4}$ dont edit this $$some result=123$$"

to this

"dont edit this $result=\\;\frac{1}{4}$ dont edit this $$some\\;result=123$$"

I have managed to do it using the messy function below but would like to use regex for a cleaner approach. Any help would be appreciated

import re
vals = r"dont edit this $result= \frac{1}{4}$ dont edit this $$some result=123$$"
def cleanlatex(vals):
    vals = vals.replace(" ", "  ")
    char1 = r"\$\$"
    char2 = r"\$"
    indices = [i.start() for i in re.finditer(char1, vals)]
    indices += [i.start() for i in re.finditer(char2, vals.replace("$$","~~"))]

    indices.sort()
    print(indices)
    # check that no of $ or $$ are even
    if len(indices) % 2 == 0:
        while indices:
            start = indices.pop(0)
            finish = indices.pop(0)
            vals = vals[:start] + vals[start:finish].replace('  ', '\;') + vals[finish:]
    
    vals = vals.replace("  ", " ")
    return vals

print(cleanlatex(vals))

Output:

[18, 39, 60, 78]   
dont edit this $result=\\;\frac{1}{4}$ dont edit this $$some\\;result=123$$

Solution

  • With regex I would still do it in two steps:

    • Identify the parts between dollars (or double dollars) using regex
    • Within those parts, replace spaces with a simple replace call
    def cleanlatex(vals):
        return re.sub(r"(\$\$?)(.*?)\1", lambda m: m[0].replace(" ", r"\;"), vals)  
    

    If the dollars don't match up, this will still make replacements, up until no more pair of matching dollars is found. This is a different behaviour from how your code works where nothing is replaced when the dollars don't match.

    When dollars are "nested", like in "$$nested $ here$$", then the inner dollar will not be regarded as a delimiter in this solution. Or if a double dollar happens to follow a single dollar, the double one will be interpreted as two single dollars that just happen to follow each other. So "$part one$$part two$" will identify two parts, each delimited with a single dollar.

    Your question didn't give any such boundary conditions (there are quite a few of them), so the solution may need some adaptations.