Search code examples
pythonregexpython-re

Python regex to find all strings that start with ' and end with '.tr ignoring leading and trailing whitespaces


I am struggling to get the correct regex for my script. I would like to find all Substrings in a file that start with a ' and end with '.tr. And save all these matches in a list.

This is what Ive got so far:

import glob
import pathlib
import re
       
libPathString = str(pathlib.Path.cwd().parent.resolve()) 

for path in glob.glob(libPathString + "/**", recursive=True):
    if(".dart" in path):
        with open(path, 'r+', encoding="utf-8") as file:
            data = [line.strip() for line in file.readlines()]
            data = ''.join(data)
            words = re.findall(r'\'.*\'.tr', data)
            print(words)

The first problem is that words is not just the matching substring but the whole file until the substring.

Also it is giving me this file:

  child: Hero(
    tag: heroTag ?? '',  // <- because of this and the line below starts with `tr`
    transitionOnUserGestures: true,
    child: Material(

But this should not match!

And then it is not finding this:

  AutoSizeText(
      'Das ist ein langer Text, der immer in einer Zeile ist.'
          .tr,
      style: AppTextStyles.montserratH4Regular,

This one should match!

What am I missing here?


Solution

  • You can use

    words = re.findall(r"'[^'\\]*(?:\\.[^'\\]*)*'\s*\.tr\b", data)
    

    See the Python demo. Details:

    • '[^'\\]*(?:\\.[^'\\]*)*' - ', zero or more chars other than ' and \, and then zero or more sequences of a \ followed with any single char and any zero or more chars other than ' and \ (this will match strings between ' chars with any escaped chars in between)
    • \s* - zero or more whitespaces (this will match any whitespace, including line breaks)
    • \.tr - .tr string (note the escaped . that now matches a litera dot)
    • \b - word boundary.