I am struggling to get the correct regex
for my script. I would like to find all Substrings
in a file
that start with a '
and end with '.tr
. And save all these matches in a list.
This is what Ive got so far:
import glob
import pathlib
import re
libPathString = str(pathlib.Path.cwd().parent.resolve())
for path in glob.glob(libPathString + "/**", recursive=True):
if(".dart" in path):
with open(path, 'r+', encoding="utf-8") as file:
data = [line.strip() for line in file.readlines()]
data = ''.join(data)
words = re.findall(r'\'.*\'.tr', data)
print(words)
The first problem is that words
is not just the matching substring but the whole file until the substring.
Also it is giving me this file:
child: Hero(
tag: heroTag ?? '', // <- because of this and the line below starts with `tr`
transitionOnUserGestures: true,
child: Material(
But this should not match!
And then it is not finding this:
AutoSizeText(
'Das ist ein langer Text, der immer in einer Zeile ist.'
.tr,
style: AppTextStyles.montserratH4Regular,
This one should match!
What am I missing here?
You can use
words = re.findall(r"'[^'\\]*(?:\\.[^'\\]*)*'\s*\.tr\b", data)
See the Python demo. Details:
'[^'\\]*(?:\\.[^'\\]*)*'
- '
, zero or more chars other than '
and \
, and then zero or more sequences of a \
followed with any single char and any zero or more chars other than '
and \
(this will match strings between '
chars with any escaped chars in between)\s*
- zero or more whitespaces (this will match any whitespace, including line breaks)\.tr
- .tr
string (note the escaped .
that now matches a litera dot)\b
- word boundary.