Search code examples
regexpython-2.7xml-parsingregex-negation

Using Regex to extract a specific xml tag


I have this xml string

<aof xmlns="http://tsng.jun.net/jppos/conig/hello"><num>3</num><desc>addy02</desc><tpcs>5</tpcs></aof>'

I need to extract 5 using regex.

What I have done is:

regex = re.compile(r'tag+</.+>\s*(.+)\s*<.+>')

Where tag is 'tpcs' but its returning empty tag.

Can someone please help.


Solution

  • As posted in the comments, this regex does the trick :

    (?<=<tpcs>).*?(?=<\/tpcs>)
    

    As seen in this demo.

    Explanation :

    • (?<=<tpcs>) is a positive lookbehind (?<=...), it asserts that a certain string, <tpcs> is placed before the string to match.
    • .*? the dot matches any character, zero or multiple times because it's followed by a *. Finally, the ? character next to it is a lazy quantifier which means that it's gonna match until the first occurence of what's coming next.
    • (?=<\/tpcs>) is a positive lookahead (?=...), it asserts that the string follows the pattern.