I have a file which would contain the following lines.
comm=adbd pid=11108 prio=120 success=1 target_cpu=001
I have written the following regex to match.
_sched_wakeup_pattern = re.compile(r"""
comm=(?P<next_comm>.+?)
\spid=(?P<next_pid>\d+)
\sprio=(?P<next_prio>\d+)
\ssuccess=(?P<success>\d)
\starget_cpu=(?P<target_cpu>\d+)
""", re.VERBOSE)
But now I've lines like the following also where the success component isn't there.
comm=rcu_preempt pid=7 prio=120 target_cpu=007
How do I modify my regex here to match both the cases? I tried by putting a * everywhere in that line containing "success", but it throws errors.
The solution using a regex non-capturing group and the regex.findall
function:
import regex
...
fh = open('lines.txt', 'r'); // considering 'lines.txt' is your initial file
commlines = fh.read()
_sched_wakeup_pattern = regex.compile(r"""
comm=(?P<next_comm>[\S]+?)
\spid=(?P<next_pid>\d+)
\sprio=(?P<next_prio>\d+)
(?:\ssuccess=)?(?P<success>\d)?
\starget_cpu=(?P<target_cpu>\d+)
""", regex.VERBOSE)
result = regex.findall(_sched_wakeup_pattern, commlines)
template = "{0:15}|{1:10}|{2:9}|{3:7}|{4:10}" # column widths
print(template.format("next_comm", "next_pid", "next_prio", "success", "target_cpu")) # header
for t in result:
print(template.format(*t))
Beautified output:
next_comm |next_pid |next_prio|success|target_cpu
rcu_preempt |7 |120 | |007
kworker/u16:2 |73 |120 | |006
kworker/u16:4 |364 |120 | |005
adbd |11108 |120 |1 |001
kworker/1:1 |16625 |120 |1 |001
rcu_preempt |7 |120 |1 |002