Search code examples
pythonregexpython-re

How to extract text between triple quotes using regular experssion in python


I am having following piece of raw string:

s = "###Sample Input\r\n```\r\n3\r\n100 400 1000 1200\r\n100 450 1000 1350\r\n150 400 1200 1200\r\n```"

I want to extract the text between the triple quotes which is '3\r\n100 400 1000 1200\r\n100 450 1000 1350\r\n150 400 1200 1200\r\n'

I am first converting this raw string into a python string and then I am applying following pattern:

pattern = r"Sample Input/s/s('''.*''')"
match = re.findall(pattern, s)
print(match)

But I am only getting an empty list as output. What is the correct regular expression to be used in this case for extracting text between triple quotes.


Solution

  • Use

    ```([\w\W]*?)```
    

    See regex proof.

    EXPLANATION

    --------------------------------------------------------------------------------
      ```                      '```'
    --------------------------------------------------------------------------------
      (                        group and capture to \1:
    --------------------------------------------------------------------------------
        [\w\W]*?                 any character of: word characters (a-z,
                                 A-Z, 0-9, _), non-word characters (all
                                 but a-z, A-Z, 0-9, _) (0 or more times
                                 (matching the least amount possible))
    --------------------------------------------------------------------------------
      )                        end of \1
    --------------------------------------------------------------------------------
      ```                      '```'
    

    Python code:

    s = "###Sample Input\r\n```\r\n3\r\n100 400 1000 1200\r\n100 450 1000 1350\r\n150 400 1200 1200\r\n```"
    matches = [m.group(1) for m in re.finditer("```([\w\W]*?)```", s)]
    print(matches)
    

    Results: ['\r\n3\r\n100 400 1000 1200\r\n100 450 1000 1350\r\n150 400 1200 1200\r\n']