Search code examples
python-3.xpyparsing

add own text inside nested braces + exception


Original question locates here, current question is desire to avoid one problem.

I have this code which works perfect with html_1 data:

from pyparsing import nestedExpr, originalTextFor

html_1 = '''
<html>
<head>
<title><?php echo "title here"; ?></title>
<head>
    <body>
        <h1 <?php echo "class='big'" ?>>foo</h1>
    </body>
</html>
'''

html_2 = '''
<html>
<head>
<title><?php echo "title here"; ?></title>
<head>
    <body>
        <h1 <?php echo $tpl->showStyle(); ?>>foo</h1>
    </body>
</html>
'''

nested_angle_braces = nestedExpr('<', '>')

# for match in nested_angle_braces.searchString(html):
#     print(match)

# nested_angle_braces_with_h1 = nested_angle_braces().addCondition(
#                                             lambda tokens: tokens[0][0].lower() == 'h1')

nested_angle_braces_with_h1 = originalTextFor(
    nested_angle_braces().addCondition(lambda tokens: tokens[0][0].lower() == 'h1')
    )
nested_angle_braces_with_h1.addParseAction(lambda tokens: tokens[0] + 'MY_TEXT')

print(nested_angle_braces_with_h1.transformString(html_1))

Result of html_1 variable is:

<html>
<head>
<title><?php echo "title here"; ?></title>
<head>
    <body>
        <h1 <?php echo "class='big'" ?>>MY_TEXTfoo</h1>
    </body>
</html>

Here is all right, all placed as expected. MY_TEXT located in right region (inside h1 tag).

But let's see result for html_2:

<html>
<head>
<title><?php echo "title here"; ?></title>
<head>
    <body>
        <h1 <?php echo $tpl->showStyle(); ?>MY_TEXT>foo</h1>
    </body>
</html>

Now we got error, MY_TEXT placed inside h1 property area because PHP contains brace inside "$tpl->".

How I can fix it? I need get this result in that region:

<h1 <?php echo $tpl->showStyle(); ?>>MY_TEXTfoo</h1>

Solution

  • The solution requires that we define a special expression for PHP tags, which our simple nestedExpr gets confused by.

    # define an expression for a PHP tag
    php_tag = Literal('<?') + 'php' + SkipTo('?>', include=True)
    

    We'll need more than simple strings now for the opener and closer, including a negative lookahead when matching a '<' to make sure we aren't at the leading edge of a PHP tag:

    # define expressions for opener and closer, such that  we don't 
    # accidentally interpret a PHP tag as a nested  expr
    opener = ~php_tag + Literal("<")
    closer = Literal(">")
    

    If opener and closer aren't simple strings, then we need to give a content expression too. Our content will be very simple to define, just PHP tags or other Words of printables, excluding '<' and '>' (you'll end up wrapping this all back up in originalTextFor anyway):

    # define nested_angle_braces to potentially contain PHP tag, or 
    # some other printable (not including '<' or '>' chars)
    nested_angle_braces = nestedExpr(opener, closer, 
                                     content=php_tag | Word(printables, excludeChars="<>"))
    

    Now if I use nested_angle_braces.searchString to scan html_2, I get:

    for tag in originalTextFor(nested_angle_braces).searchString(html_2):
        print(tag)
    
    ['<html>']
    ['<head>']
    ['<title>']
    ['</title>']
    ['<head>']
    ['<body>']
    ['<h1 <?php echo $tpl->showStyle(); ?>>']
    ['</h1>']
    ['</body>']
    ['</html>']