I have a text file like this
Some text and random stuff that I don't need
2 8
2 9 T
4 9
1 10
2 10 F
7 11 T
More random stuff
How should I construct a regex function to extract both the rows with just numbers and the rows with numbers and T or F? So far my idea for the code is this
with open(file, 'r') as log_file:
# opening file
file = log_file
while True:
line = file.readlines()
# if line in regex function:
data.append(line)
# closing file
break
How can I solve this?
With this approach, the re pattern will match only numbers or numbers that end with the letter T or F. You could also use a for loop instead of a while loop.
import re
matched_data = []
with open(file, 'r') as log_file:
data = log_file.readlines()
for line in data:
line = line.strip()
if re.match(r'^\d+ \d+( [TF])?$', line):
matched_data.append(line)
print(matched_data)
if some of the lines starts with a letter eg;
T 7 11
and you want to match those as well, you should substitute the above pattern with r'^[TF]|\d+ \d+( [TF])?$'
Test Code:
import re
data = """
2 8
2 9 T
4 9
1 10
2 10 F
7 11 T
5 B 37
Y 9 G
T 7 11
MG 99 Z
"""
data = data.splitlines()
matched_data = []
for line in data:
line = line.strip()
if re.match(r'^\d+ \d+( [TF])?$', line):
matched_data.append(line)
print(matched_data)
# ['2 8', '2 9 T', '4 9', '1 10', '2 10 F', '7 11 T']