I would like to filter a certain text from a file using Regex package in python, taking into consideration the text file has multiple newlines and spaces. The file may have several data blocks, but the only required is the one with specific keywords. In my problem it should belong to a group contains "Route-Details" keyword.
Let us say that the file(sample.txt), is shown below.
.
.
.
Host1<-->Host2 Con. ID: 0x0fc2f0d9 (abc123)
Con. Information:
[Gw] Route-Details
R-Code: 0xaaaa (1a2) Route-Details
Router-ID: 0x21 (a4) [Gw]
Path-Code: 0x00e (15)
Data: 123-abcd.djsjdkks www.somesite. port 11
Coded info
aa aa aa aa aa aa aa aa 1111-aaa
aa aa aa aa aa aa aa aa 1111-aaa
.
.
.
This what I have written
import re
with open("sample.txt", "r") as fl:
in_file= fl.read()
(re.search('(?<=Route-Details).* Data:', in_file,re.DOTALL).group())
I expect to obtain this.
123-abcd.djsjdkks www.somesite. port 11
However, I got this.
R-Code: 0xaaaa (1a2) Route-Details
Router-ID: 0x21 (a4) [Gw]
Path-Code: 0x00e (15)
Data:
I wonder if I can get simplified and elaborated solution(s) for this. Thanks so much for your help.
You can use a positive look-behind and capturing group:
re.findall(r'(?<=Data: )(.*?)\n', text)
Yields:
['123-abcd.djsjdkks www.somesite. port 11']
Additionally, you can try the following to include the Route-Details
condition you specified:
re.findall(r'(?<=Route-Details).*?(?<=Data: )(.*?)\n', text, re.DOTALL)
For a detailed explanation, see here. Also, re.DOTALL
specifies that the .
character will match all characters, including newlines.