Search code examples
pythonpython-3.xregex

Python regex for matching next line if present


I have to match some lines as below.

Case 1:

[01:32:12.036,000] <tag> label: val3. STATUS = 0x1
[01:32:12.036,001] <tag> label: val3. MISC = 0x8
[02:58:34.971,000] <tag> label: val2. STATUS = 0x2

Case 2:

[01:32:12.036,000] <tag> label: val3. STATUS = 0x1
[02:58:34.971,000] <tag> label: val2. STATUS = 0x2
[01:32:12.036,001] <tag> label: val2. MISC = 0x6

The line that has MISC value is optional and may be missing. The line with STATUS will always preceed MISC line and is always present.

To match this I am using regex like this: "label: val(\d+). STATUS = (0x[0-9a-fA-F]+)(.*?(label: val(\d+). MISC = (0x[0-9a-fA-F]+)))?"

This is working for Case 1 and is correctly reporting the values. The ootput for matched groups is as below:

MATCH 1
[0] 3
[1] 0x1
[2] 
[01:32:12.036,001] <tag> label: val3. MISC = 0x8
[3] label: val3. MISC = 0x8
[4] 3
[5] 0x8

MATCH 2
[0] 2
[1] 0x2
[2] 
[3] 
[4] 
[5] 

But for Case 2, this is skipping second STATUS in line 2 as below:

Match 1
[0] 3
[1] 0x1
[2] 
[02:58:34.971,000] <tag> label: val2. STATUS = 0x2
[01:32:12.036,001] <tag> label: val2. MISC = 0x6
[3] label: val2. MISC = 0x6
[4] 2
[5] 0x6

I needed 2 matches here also, with first match not reporting MISC. What am I doing wrong here?


Solution

  • Here is a capture pattern.
    This will return a group for each of the values.

    ^\[(.+?)\] <(.+?)> (.+?): (.+?) (STATUS|MISC) .+0x(.+?)$
    
    01:32:12.036,000, tag, label, val3., STATUS, 1
    01:32:12.036,001, tag, label, val3., MISC, 8
    02:58:34.971,000, tag, label, val2., STATUS, 2
    01:32:12.036,000, tag, label, val3., STATUS, 1
    02:58:34.971,000, tag, label, val2., STATUS, 2
    01:32:12.036,001, tag, label, val2., MISC, 6
    

    For refined values, try the following.
    This excludes the "0x", on the last value.

    ^\[.+?\] <.+?> .+?: val(.+?)\. (STATUS|MISC) .+0x(.+?)$
    
    3, 1
    3, 8
    2, 2
    3, 1
    2, 2
    2, 6