Search code examples
c++regexvisual-c++boost-regex

Get tags of the same level?


Boost throws an exception.

An unhandled exception "0x000007fefd0aa88d" in the "svg.exe": Exception Microsoft C: boost :: exception_detail :: clone_impl <boost :: exception_detail :: error_info_injector <std::runtime_error>>

for the regular expression

opentag = [<] [s] [p] [a] [n]
closetag = [<] [/] [s] [p] [a] [n] [>]
    /opentag+"(?>(?>(?!"+opentag+"|"+closetag+").) |(?R))*"+closetag/

If the incoming string (std :: string) size is longer than 140 kb.

I need to get the tags at the same level.

std :: string = "<span id=1> </ span>
<span id=2> <span> </ span> </ span>
<span id=3> <span> <span> <span> </ span> </ span> </ span> </ span> "

three strings

0) <span id=1> </ span>
1) <span id=2> <span> </ span> </ span>
2) <span id=3> <span> <span> <span> </ span> </ span> </ span> </ span>

How to change the boost options, that regex try to process a string of any size and did not throw an exception. OR how get tags of same level without regular expression?


Solution

  • You cannot parse HTML with regular expressions. For normal context-free grammars, I'd suggest just getting a parser, but the realistic state of HTML is that you need a dedicated HTML parser library.