I am writing a code to extract something useful from a very big Source.txt
file.
A sample of my source test file is as below:
Test case AAA
Current Parameters:
Some unique param : 1
Some unique param : 2
Some unique param : 3
Some unique param : 4
*A line of rubbish*
*Another line of rubbish*
*Yet another line of rubbish*
*More and more rubbish*
Test AAA PASS
Test case BBB
Current Parameters:
Some unique param : A
Some unique param : B
Some unique param : C
Some unique param : D
*A line of rubbish*
*Another line of rubbish*
*Yet another line of rubbish*
*More and more rubbish*
Test BBB PASS
Now I am writing a code to extract only the Test case
and Current Parameters
:
processed = []
def main():
source_file = open("Source.txt","r") #Open the raw trace file in read mode
if source_file.mode == "r":
contents = source_file.readlines() #Read the contents of the file
processed_contents = _process_content(contents)
output_file = open("Output.txt","w")
output_file.writelines(processed_contents)
pass
def _process_content(contents):
for raw_lines in contents:
if "Test case" in raw_lines:
processed.append(raw_lines)
elif "Current Parameters" in raw_lines:
processed.append(raw_lines)
#I am stuck here
elif "PASS" in raw_lines or "FAIL" in raw_lines:
processed.append(raw_lines)
processed.append("\n")
return processed
#def _process_parameters():
if __name__ == '__main__':
main()
After the line Current Parameters
, I wanted to grab each of the Some unique param
which will not be the same always and append to processed
list so that it will be also noted in my Output.txt
My desired output is:
Test case AAA
Current Parameters:
Some unique param : 1
Some unique param : 2
Some unique param : 3
Some unique param : 4
Test AAA PASS
Test case BBB
Current Parameters:
Some unique param : A
Some unique param : B
Some unique param : C
Some unique param : D
Test BBB PASS
If you see, I wanted to remove all the rubbish lines. Note that there are a lot of rubbish in my Source.txt
. I am not sure how to go to the next raw_lines
from there. Appreciate your help.
This is one approach using Regex.
Ex:
import re
result = []
with open(filename) as infile:
for raw_lines in infile:
if "Test case" in raw_lines:
result.append(raw_lines)
if "Current Parameters" in raw_lines:
result.append(raw_lines)
raw_lines = next(infile) #next() to move to next line.
while True:
m = re.search(r"(?P<params>\s*\w+\s*:\s*\w+\s*)", raw_lines)
if not m:
break
result.append(m.group("params"))
raw_lines = next(infile)
if "PASS" in raw_lines or "FAIL" in raw_lines:
result.append(raw_lines)
result.append("\n")
print(result)
Output:
['Test case AAA\n',
'Current Parameters:\n',
' param : 1\n',
' param : 2\n',
' param : 3\n',
' param : 4\n',
'Test AAA PASS\n',
'\n',
'Test case BBB\n',
'Current Parameters:\n',
' param : A\n',
' param : B\n',
' param : C\n',
' param : D\n',
'Test BBB PASS',
'\n']