Search code examples
perlmojolicious

Trouble Replacing Text in HTML fragment using Mojo::DOM


I need to scan through html fragments looking for certain strings in text (not within element attributes) and wrapping those matching strings with a <span></span>. Here's a sample attempt with output:

use v5.10;
use Mojo::DOM;

my $body = qq|
<div>
<p>Boring Text:</p>
<p>
Highlight Cool whenever we see it.
but not <a href="/Cool.html">here</a>.
<code>
    sub Cool {
        print "Foo\n";
    }
</code>
And here is more Cool.
</p>
</div>
|;
my $dom = Mojo::DOM->new($body);

foreach my $e ($dom->find('*')->each) {
    my $text = $e->text;
    say "e text is:  $text ";
    if ($text =~ /Cool/) {
        (my $newtext = $text ) =~ s/Cool/<span class="fun">Cool<\/span>/g;
        $e->replace_content($newtext);
    }
}

say $dom->root;

the output:

e text is:   
e text is:  Boring Text: 
e text is:  Highlight Cool whenever we see it. but not. And here is more Cool. 
e text is:  here 
e text is:  sub Cool { print "Foo "; } 

<div>
<p>Boring Text:</p>
<p>Highlight <span class="fun">Cool</span> whenever we see it. but not. And here is more <span class="fun">Cool</span>.</p>
</div>

Close but what I really want to see is something like the following:

<div>
<p>Boring Text:</p>
<p>Highlight <span class="fun">Cool</span> whenever we see it. but not <a href="/Cool.html">here</a>. 
<code>
sub <span class="fun">Cool<span> { 
    print "Foo\n"; 
}
</code>  
And here is more <span class="fun">Cool</span>.</p>
</div>

Any help / pointers would be greatly appreciated. Thanks, Todd


Solution

  • Having looked into XML::Twig I'm not so sure it's the correct tool. It's surprising how awkward such a simple task can be.

    This is a working program that uses HTML::TreeBuilder. Unfortunately it doesn't produce formatted output so I've added some whitespace myself.

    use strict;
    use warnings;
    
    use HTML::TreeBuilder;
    
    my $html = HTML::TreeBuilder->new_from_content(<<__HTML__);
    <div>
    <p>Boring Text:</p>
    <p>
    Highlight Cool whenever we see it.
    but not <a href="/Cool.html">here</a>.
    <code>
        sub Cool {
            print "Foo\n";
        }
    </code>
    And here is more Cool.
    </p>
    </div>
    __HTML__
    
    $html->objectify_text;
    
    for my $text_node ($html->look_down(_tag => '~text')) {
    
      my $text = $text_node->attr('text');
    
      if (my @replacement = process_text($text)) {
        my $old_node = $text_node->replace_with(@replacement);
        $old_node->delete;
      }
    }
    
    $html->deobjectify_text;
    
    print $html->guts->as_XML;
    
    sub process_text {
    
      my @nodes = split /\bCool\b/, shift;
      return unless @nodes > 1;
    
      my $span = HTML::Element->new('span', class => 'fun');
      $span->push_content('Cool');
    
      for (my $i = 1; $i < @nodes; $i += 2) {
        splice @nodes, $i, 0, $span->clone;
      }
    
      $span->delete;
    
      @nodes;
    }
    

    output

    <div>
    <p>Boring Text:</p>
    <p>
    Highlight <span class="fun">Cool</span> whenever we see it.
    but not <a href="/Cool.html">here</a>.
    <code> sub <span class="fun">Cool</span> { print &quot;Foo &quot;; } </code>
    And here is more <span class="fun">Cool</span>.
    </p>
    </div>