How can I parse the following data using regex expressions:
Test data 1
Measurement 1 X : -0.100 Y : 2.300
Something 1 : 0.00
Stuff 1 : 0.00
Needed 1 X : -0.800 Y : 5.300
Test data 2
Measurement 1 X : -0.600 Y : 4.300
Something 1 : 0.30
Stuff 1 : -0.20
Extra 1 : -0.800
I want to extract the Measurement 1 data (X and Y values) and the Needed 1 data (X and Y values) from Test data 1
I also want to extract the Measurement 1 data (X and Y values) and the Extra 1 data from Test data 2
The measurements have the same names just under different table names.
for line in data:
if "Test data 1" in line
match = re.match (r" Measurement 1 X : ([\-\d\.]+) Y : ([\-\d\.]+)\s*$", line)
if match:
X_table1 = match.group(1)
Y_table1 = match.group(2)
if "Test data 2" in line
match = re.match (r" Measurement 1 X : ([\-\d\.]+) Y : ([\-\d\.]+)\s*$", line)
if match:
X_table2 = match.group(1)
Y_table2 = match.group(2)
Thank you for any help
You're processing your data one line at a time but the X and Y values are on different lines than the segment headers. Because of that, your code needs to remember which segment it currently processes (i.e. a simple parser). Also, you can reuse a generic pattern to extract the X and Y values.
data1 = data2 = False
xy_pattern = r'X\s+:\s+([\-\d\.]+)\s+Y\s+:\s+([\-\d\.]+)'
for line in data:
# set state
if "Test data 1" in line:
data1 = True
continue
elif "Test data 2" in line:
data1 = False
data2 = True
continue
# extract data
if data1 and 'Measurement' in line:
matches = re.findall(xy_pattern, line)
if matches:
X_table1, Y_table1 = matches[0]
elif data2 and 'Measurement' in line:
matches = re.findall(xy_pattern, line)
if matches:
X_table2, Y_table2 = matches[0]
In the same way, you can check for the Extra
line. Note however that your matches are still strings so you might want to convert them to floats, depending on what you want to do with them.