I am currently trying to match pattern for an eeprom dump text file to locate a certain address and then traverse 4 steps once I hit upon in the search. I have tried the following code for finding the pattern
regexp_list = ('A1 B2')
line = open("dump.txt", 'r').read()
pattern = re.compile(regexp_list)
matches = re.findall(pattern,line)
for match in matches:
print(match)
this scans the dump for A1 B2
and displays if found. I need to add more such addresses in search criteria for ex: 'C1 B2', 'D1 F1'
.
I tried making the regexp_list
as a list and not a tuple, but it didn't work.
This is one of the problem. Next when I hit upon the search, I want to traverse 4 places and then read the address from there on (See below).
Input:
0120 86 1B 00 A1 B2 FF 15 A0 05 C2 D1 E4 00 25 04 00
Here when the search finds A1 B2
pattern, I want to move 4 places i.e to save data from C2 D1 E4
from the dump.
Expected Output:
C2 D1 E4
I hope the explanation was clear.
#Thanks to @kcorlidy
Here's the final piece of code which I had to enter to delete the addresses in the first column.
newtxt = (text.split("A0 05")[1].split()[4:][:5])
for i in newtxt:
if len(i) > 2:
newtxt.remove(i)
and so the full code looks like
import re
text = open('dump.txt').read()
regex = r"(A1\s+B2)(\s+\w+){4}((\s+\w{2}(\s\w{4})?){3})"
for ele in re.findall(regex,text,re.MULTILINE):
print(" ".join([ok for ok in ele[2].split() if len(ok) == 2]))
print(text.split("A1 B2")[1].split()[4:][:5])
#selects the next 5 elements in the array including the address in 1st col
newtxt = (text.split("A1 B2")[1].split()[4:][:5])
for i in newtxt:
if len(i) > 2:
newtxt.remove(i)
Input:
0120 86 1B 00 00 C1 FF 15 00 00 A1 B2 00 00 00 00 C2
0130 D1 E4 00 00 FF 04 01 54 00 EB 00 54 89 B8 00 00
Output:
C2 0130 D1 E4 00
C2 D1 E4 00
Using regex can extract text, but also you can complete it through split text.
Regex:
(A1\s+B2)
string start with A1
+ one or more space
+ B2
(\s+\w+){4}
move 4 places((\s+\w+(\s+\w{4})?){3})
extract 3 group of string, and There may be 4 unneeded characters in the group. Then combine them into one.Split:
Note: If you have a very long text or multiple lines, don't use this way.
text.split("A1 B2")[1]
split text to two part. the after is we need.split()
split by blank space and became the list ['FF', '15', 'A0', '05', 'C2', 'D1', 'E4', '00', '25', '04', '00']
[4:][:3]
move 4 places, and select the first threeTest code:
import re
text = """0120 86 1B 00 A1 B2 FF 15 A0 05 C2 D1 E4 00 25 04 00
0120 86 1B 00 00 C1 FF 15 00 00 A1 B2 00 00 00 00 C2
0130 D1 E4 00 00 FF 04 01 54 00 EB 00 54 89 B8 00 00 """
regex = r"(A1\s+B2)(\s+\w+){4}((\s+\w{2}(\s\w{4})?){3})"
for ele in re.findall(regex,text,re.MULTILINE):
#remove the string we do not need, such as blankspace, 0123, \n
print(" ".join([ok for ok in ele[2].split() if len(ok) == 2]))
print( text.split("A1 B2")[1].split()[4:][:3] )
Output
C2 D1 E4
C2 D1 E4
['C2', 'D1', 'E4']