Search code examples
pythonnlppython-re

How to find a pattern inside a pattern when start and end is known?


I have a pattern that has a starting and ending pattern like:

start = '\n\\[\n'
end = '\n\\]\n'

My string is:

'The above mentioned formal formula is\nthat of\n\\[\n\\oplus \\bigoplus_{(5)} \\widehat{C_{(5)}} A_{5} G(2)\n\\]\nA. Tobacco\nB. Tulip\nc. soybean\nD. Sunhemp'

I want to find:

\n\\oplus \\bigoplus_{(5)} \\widehat{C_{(5)}} A_{5} G(2)'

If I use:

re.findall(r'\s*\\+\n\\[\n(.*?)\\+\n\\]\n', mystring)

r'\s*\\+\[(.*?)\\+\]' # did not work either

then it gives me an empty result. What am I doing wrong here?


Solution

  • This works for me:

    mystring = 'The above mentioned formal formula is\nthat of\n\\[\n\\oplus \\bigoplus_{(5)} \\widehat{C_{(5)}} A_{5} G(2)\n\\]\nA. Tobacco\nB. Tulip\nc. soybean\nD. Sunhemp'
    
    expected_result = '\n\\oplus \\bigoplus_{(5)} \\widehat{C_{(5)}} A_{5} G(2)'
    
    import codecs
    import re
    
    matches = re.findall(r'\\n\\\\\[(\\n.*)\\n\\\\\]\\n', repr(mystring))
    
    results = [codecs.decode(match, 'unicode_escape') for match in matches]
    
    results
    ['\n\\oplus \\bigoplus_{(5)} \\widehat{C_{(5)}} A_{5} G(2)']
    
    results[0] == expected_result
    True