Search code examples
perlmojolicious

How can I remove an attribute from all DOM elements with Mojolicious?


I want to remove the bgcolor attribute from all elements of a page I am scraping via Mojolicious.

My attempt has been the following:

$dom->all_contents->each(sub { $_->attr('bgcolor' => undef) });

but this seems not to work.

How do I do it right?


Solution

  • The following uses Mojo::DOM to delete the bgcolor attribute for every node:

    use strict;
    use warnings;
    
    use Mojo::DOM;
    
    my $dom = Mojo::DOM->new(do {local $/; <DATA>});
    
    for my $node ($dom->find('*')->each) {
        delete $node->{bgcolor};
    }
    
    print $dom;
    
    __DATA__
    <html>
    <head>
    <title>Hello background color</title>
    </head>
    <body bgcolor="white">
    <h1>Hello world</h1>
    <table>
    <tr><td bgcolor="blue">blue</td></tr>
    <tr><td bgcolor="green">green</td></tr>
    </table>
    </body>
    </html>
    

    Outputs:

    <html>
    <head>
    <title>Hello background color</title>
    </head>
    <body>
    <h1>Hello world</h1>
    <table>
    <tr><td>blue</td></tr>
    <tr><td>green</td></tr>
    </table>
    </body>
    </html>
    

    Notes:

    1. It's possible to use CSS Selectors to limit the returned nodes to only those containing the specific attribute:

      for my $node ($dom->find('[bgcolor]')->each) {
      
    2. One can also let Mojo handle the iteration like the following:

      $dom->find('*')->each(sub {
          delete $_->{bgcolor};
      });