So I'm trying to extract certain values from a raw text file like this
Number of zero columns: 4
Memory requirement - global matrix: 1571340 solver (totally): 1571340
P1127_VELOCITIES #001000 Step: 59 Iteration: 2 Time: 0.04055 0.0015347
P2243_VELOCITIES #001000 Step: 59 Iteration: 2 Time: 0.04055 0.0017193
P3387_VELOCITIES #001000 Step: 59 Iteration: 2 Time: 0.04055 0.0015347
% of load in interval Step: 59 Iteration: 2 Time: 0.04055 0.0400000 0.0400000
summation % of load in interval Step: 59 Iteration: 2 Time: 0.04055 0.0800000
Number of zero columns: 4
Memory requirement - global matrix: 1571340 solver (totally): 1571340
P1127_VELOCITIES #001000 Step: 59 Iteration: 2 Time: 0.01638 -0.0016876
P2243_VELOCITIES #001000 Step: 59 Iteration: 2 Time: 0.01638 -0.0018896
P3387_VELOCITIES #001000 Step: 59 Iteration: 2 Time: 0.01638 -0.0016876
% of load in interval Step: 59 Iteration: 2 Time: 0.01638 0.0400000 0.0400000
summation % of load in interval Step: 59 Iteration: 2 Time: 0.01638 0.0800000
So I want to extract P1127_VELOCITIES
by using this code:
P1127_positive = re.compile(r'P1127_VELOCITIES #001000 Step: (\d+) Iteration: (\d+) Time: (\d+\.\d+) (\d*\.\d+|-\d*\.\d+)')
P1127_negative = re.compile(r'P1125_VELOCITIES #001000 Step: (\d+) Iteration: (\d+) Time: (\d+\.\d+) (\d*\.\d+|-\d*\.\d+)')
def Extract_Data(filepath, expression_positive, expression_negative, data):
velocity_list = []
time_list = []
#negative_data = []
with open(filepath) as file:
for line in file:
data.extend(expression_positive.findall(line))
with open(filepath) as file:
for line in file:
data.extend(expression_negative.findall(line))
print(data[0])
print(data[1])
for data_tuple in data:
step, iteration, time, velocity = data_tuple
velocity_list.append(float(velocity))
time_list.append(float(time))
return velocity_list, time_list
However, I want to extract all float values at the right end, not positive and negative values separately. As you can see in the text file, the positive values have 2 spaces (i.e. Time: 0.04055[space][space]0.0015347
while the negative values only have 1 space (i.e.Time: 0.01638[space]-0.0016876
)
Is there a way to extract both values using re.compile? (like what I have above but extract both). What expressions would you recommend? (i.e. ([-+]?\d\.\d+)
)
Cheers!
The regexes in the provided code seem like overkill for the file you've provided. I don't see any reason for them to be so rigid that changing one character requires a new pattern. It doesn't seem like there'll be enough minute variation in the file to be quite so specific about the number of spaces and formatting in a line.
This snippet does the job cleanly on the file you've shared (I'm using append
rather than extend
so that each row's time pair is preserved). It's simple to add more requirements to match lines more specifically as needed (if you wish to specify a step or iteration, for example). You can also build the regex pattern dynamically if you'd like to drop this into a function and use it to filter by different velocity values.
import re
pattern = r"P1127_VELOCITIES.+?Time:\s*(\S+)\s+(\S+)\s*$"
data = []
with open("file.txt") as f:
for line in f:
m = re.match(pattern, line)
if m:
data.append(tuple(map(float, m.groups())))
print(data)
Output:
[(0.04055, 0.0015347), (0.01638, -0.0016876)]