I am trying to extract data from a change log using RegEx. Here is an example how the change log is structured:
96545
this is some changes in the ticket
some new version: x.x.22
another change
new version: x.y.2.2
120091
this is some changes in the ticket
some new version: z.z.22
another change
another change
another change
new version: z.y.2.2
120092
...
...
...
new version: ***
. ***
is string which is variable for every ID.I was using the RegExStrom Tester to test my RegEx.
So far I have: ^\w{5,6}(.|\n)*?\d{5,6}
however the result includes the ID from the next ticket, which I need to avoid.
Result:
96545
this is some changes in the ticket
some new version: x.x.22
another change
new version: x.y.2.2
120091
Expected Result:
96545
this is some changes in the ticket
some new version: x.x.22
another change
new version: x.y.2.2
If the problem was that you capture the ID of the next Ticket just use positive look ahead to mach it but not capture it, or consume it:
# end of tickets is the end of line that the line after it contains the Id of the next ticket
pattern = r"\d{5,6}[\s\S]*?(?=\n\d{5,6})"
# to extract first ticket info just use search
print(re.search(pattern, text).group(0))
# to extract all tickets info in a list use findall
print(re.findall(pattern, text))
# if the file is to big and you want to extract tickets in lazy mode
for ticket in re.finditer(pattern,text):
print(ticket.group(0))