Search code examples
pythonregexstring-parsing

Python regex doesn't find certain pattern


I am trying to parse latex code from html code which looks like this:

string = " your answer is wrong! Solution: based on \((\vec{n_E},\vec{g})= 0 \) and \(d(g,E)=0\) beeing ... "

I want to replace all latex code with the output of a function that takes the latex code as an argument (Since there is a problem with finding the correct pattern, the function extract returns an empty string for the moment).

I tried:

latex_end = "\)"
latex_start = "\("    
string = re.sub(r'{}.*?{}'.format(latex_start, latex_end), extract, string)

Result:

your answer is wrong! Solution: based on \= 0 \) and \=0\) beeing ...

Expected:

your answer is wrong! Solution: based on and beeing ...

Any idea why it does not find the pattern? Is there a way to implement it?


Solution

  • You should use a raw string for your definition of string since \v is being interpreted as a special character.

    import re
    
    string = r" your answer is wrong! Solution: based on \((\vec{n_E},\vec{g})= 0 \) and \(d(g,E)=0\) beeing ... "
    
    
    string = re.sub(r'\\\(.*?\\\)', '', string))
    print(string)
    

    Prints:

     your answer is wrong! Solution: based on  and  beeing ...
    

    If you need to have variables for the start and end:

    latex_end = r"\\\)"
    latex_start = r"\\\("    
    string = re.sub(r'{}.*?{}'.format(latex_start, latex_end), '', string)
    print(string)