Search code examples
pythonregexregex-group

How to write a regex in Python with named groups to match this?


I have a file which would contain the following lines.

comm=adbd pid=11108 prio=120 success=1 target_cpu=001

I have written the following regex to match.

_sched_wakeup_pattern = re.compile(r"""
comm=(?P<next_comm>.+?)
\spid=(?P<next_pid>\d+)
\sprio=(?P<next_prio>\d+)
\ssuccess=(?P<success>\d)
\starget_cpu=(?P<target_cpu>\d+)
""", re.VERBOSE)

But now I've lines like the following also where the success component isn't there.

comm=rcu_preempt pid=7 prio=120 target_cpu=007

How do I modify my regex here to match both the cases? I tried by putting a * everywhere in that line containing "success", but it throws errors.


Solution

  • The solution using a regex non-capturing group and the regex.findall function:

    import regex
    ...
    fh = open('lines.txt', 'r');  // considering 'lines.txt' is your initial file
    commlines = fh.read()
    
    _sched_wakeup_pattern = regex.compile(r"""
    comm=(?P<next_comm>[\S]+?)
    \spid=(?P<next_pid>\d+)
    \sprio=(?P<next_prio>\d+)
    (?:\ssuccess=)?(?P<success>\d)?
    \starget_cpu=(?P<target_cpu>\d+)
    """, regex.VERBOSE)
    
    result = regex.findall(_sched_wakeup_pattern, commlines)
    
    template = "{0:15}|{1:10}|{2:9}|{3:7}|{4:10}" # column widths
    print(template.format("next_comm", "next_pid", "next_prio", "success", "target_cpu")) # header
    
    for t in result:
        print(template.format(*t))
    

    Beautified output:

    next_comm      |next_pid  |next_prio|success|target_cpu
    rcu_preempt    |7         |120      |       |007       
    kworker/u16:2  |73        |120      |       |006       
    kworker/u16:4  |364       |120      |       |005       
    adbd           |11108     |120      |1      |001       
    kworker/1:1    |16625     |120      |1      |001       
    rcu_preempt    |7         |120      |1      |002