Search code examples
pythonpython-re

Repeatedly extracting substring inbetween specific characters, in a text file (python)


I have a several pieces of data stored in a text file. I am trying to extract each type of data into individual lists so that I can plot them/make various figures. There are thousands of values so doing it specifically isn't really an option. An example of the text file is :

"G4WT7 > interaction in material = MATERIAL
G4WT7 > process PROCESSTYPE
G4WT7 > at position [um] = (x,y,z)
G4WT7 > with energy [keV] = 0.016
G4WT7 > track ID  and parent ID = ,a,b
G4WT7 > with mom dir = (x,y,z)
G4WT7 > number of secondaries= c
G4WT1 > interaction in material = MATERIAL
G4WT1 > process PROCESSTYPE
G4WT1 > at position [um] = (x,y,z)
G4WT1 > with energy [keV] = 0.032
G4WT1 > track ID  and parent ID = ,a,b
G4WT1 > with mom dir = (x,y,z)
G4WT1 > number of secondaries= c"

I would like to extract strings such as the string following "energy [keV] =" so 0.016, 0.032 etc, into a list. I hope to be able to separate all the data similarly to this.

So far I've tried to use regex, as following:

import re
file = open('file.txt')
textfile =file.read()
Energy = re.findall('[keV] = ;(.*)G', textfile)

But it just generates an empty list; [] I'm a newbie to python, so apologies if the answer is obvious, and any help would be greatly appreciated.


Solution

  • you might want to escape the square-brackets!

    Energy = re.findall('\[keV\] = (.*)', text)
    

    ... or to be on the save-side you can also use re.escape to make sure all characters are properly escaped, e.g.:

    Energy = re.findall(re.escape('[keV] = ') + '(.*)', text)