I have a log file which contains these information :
- TEST B1
B1_D<4.9b(8.9,9.25)B
B1_D=16.9b(15.9,17.25)B
H32_DOT_FAT<4.9H(5.9,7.25)H
H32_DOT_FBAT<4.9H(5.9,7.25)H
R31=1.5K(1.45K,1.54K)R
R33=3.8K(3.62K,4.17K)R
I want to parse this file like as below :
Legend
SCN : first line
STEP : first part of split for "_" seperator
CHILD : the rest of split for "_" seperator
MESURE_CHILD : the rest of split for "=" separator or " < "
Output Expected :
SCN = TEST B1
STEP = B1
CHILD = D
MESURE_CHILD : 4.9b(8.9,9.25)B
CHILD : D
MESURE_CHILD : 16.9b(15.9,17.25)B
STEP = H32
CHILD = DOT_FAT
MESURE_CHILD : 4.9H(5.9,7.25)H
CHILD : DOT_FBAT
MESURE_CHILD : 4.9H(5.9,7.25)H
STEP = R31
CHILD : R31
MESURE_CHILD : 1.5K(1.45K,1.54K)R
STEP = R33
CHILD : R33
MESURE_CHILD : 3.8K(3.62K,4.17K)R
I use python 3.8 to coding. You find as below method that i used but i don't find the solution nicely
def createTreeStandardBloc(self, data_bloc):
index_bloc = 1
data = data_bloc[ 0 ]
if ( "=" in data [ index_bloc ] or "<" in data [ index_bloc ] ) :
prefix = re.split(r'(<|=)\s*', data [ index_bloc ] )[ 0 ]
if ( "_" in prefix ):
step_name = re.split( "_" , prefix )[ 0 ]
else:
step_name = prefix
print("STEP : "+step_name)
for index_bloc in range( 2 , len( data ) ) :
if ( "=" in data [ index_bloc ] or "<" in data [ index_bloc ] ) :
prefix_pdm = re.split(r'(<|=)\s*', data [ index_bloc ] )[ 0 ]
if ( "_" in prefix_pdm ):
step_name_temp = re.split( "_" , prefix_pdm ) [ 0 ]
pdm = re.split( "_" , prefix_pdm ) [ 1 ]
else:
step_name_temp = prefix_pdm
pdm = prefix_pdm
if ( step_name_temp != step_name ):
step_name = step_name_temp
print("STEP : "+step_name)
print("CHILD : "+pdm)
else :
print("CHILD : "+pdm)
You can use a regular expression for getting the parts from the lines:
import re
last_s = None
for i, line in enumerate(data.splitlines()):
if i == 0:
print("SCN:", line.strip("- "))
elif line.strip():
s, c, mc = re.match("^\s*([^_]+)(_\w+)?[<>=](.*)\s*$", line).groups()
if s != last_s:
print("STEP", s)
print("CHILD", c or s)
print("MEASURE_CHILD", mc)
last_s = s
Let's break this down a bit:
^\s*
-- start of line, possibly whitespace([^_]+)
-- some non-underscore chars (1st group)(_\w+)?
-- underscore, then more characters (2nd group)[<>=](.*)
-- comparison, more stuff (3rd group)\s*$
-- optional whitespace, then end of line