Good morning,
I have a string that I need to parse and print the content of two named group knowing that one might not exist.
The string looks like this (basically content of /proc/pid/cmdline):
"""
<some chars with letters / numbers / space / punctuation> /CLASS_NAME:myapp.server.starter.StarterHome /PARAM_XX:value_XX /PARAM_XX:value_XX /CONFIG_FILE:myapp.server.config.myconfig.txt /PARAM_XX:value_XX /PARAM_XX:value_XX /PARAM_XX:value_XX <some chars with letters / numbers / space / punctuation>
"""
my processes have almost the same pattern, that is:
/CLASS_NAME:myapp.server.starter.StarterHome
is always present, but
/CONFIG_FILE:myapp.server.config.myconfig.txt
is NOT always present.
I'm using python2 with re
module to catch the values. So far my pattern looks like this and I'm able to catch the value I want corresponding to /CLASS_NAME
re.compile('CLASS_NAME:\w+\W\w+\W\w+\W(?P<class>\w+)')
The because /CONFIG_FILE
is present or not, I added the following to myregexp
:
re.compile(r"""CLASS_NAME:\w+\W\w+\W\w+\W(?P<class>\w+).*?
(CONFIG_FILE:\w+\W\w+\W\w+\W(?P<cnf>\w+.txt))?
""", re.X)
My understanding is that the second part of my rexexp
is optional because the whole part is between parenthesis followed by ?
.
Unfortunately my assumption is wrong as it couldn't catch it
I also tried by removing the 1st ?
but it didn't help.
I gave several tries through PYTHEX
to try to understand my regexp but couldn't find a solution.
Could anyone have any suggestion to resolve my case?
You can wrap the whole optional part within an optional non-capturing group and make the capturing group for CONFIG_FILE
obligatory:
re.compile(r"""CLASS_NAME:(?:\w+\W+){3}(?P<class>\w+)(?:.*?
(CONFIG_FILE:(?:\w+\W+){3}(?P<cnf>\w+\.txt)))?
""", re.X)
In case there are newlines, use re.X | re.S
modifier options. Note that \w+\W\w+\W\w+\W
is better written as (?:\w+\W+){3}
.
See the regex demo
The main difference is (?:.*?(CONFIG_FILE:(?:\w+\W+){3}(?P<cnf>\w+\.txt)))?
part:
(?:
- start of an optional (as there is a greedy ?
quantifier after it) non-capturing group matching
.*?
- any 0+ chars, as few as possible(CONFIG_FILE:(?:\w+\W+){3}(?P<cnf>\w+\.txt))
- matches
CONFIG_FILE:
- a literal substring(?:\w+\W+){3}
- three sequences of 1+ word chars followed with 1+ non-word chars(?P<cnf>\w+\.txt)
- Group cnf
: 1+ word chars, a dot (note it should be escaped) and then txt
)?
- end of the optional non-capturing group (that will be tried once)