Search code examples
xmlperlxml-parsingxml-twig

how to read and change <!Doctype> tag and <?xml version="1.0"?> in xml twig?


I'm new to xml twig... how to read and change <!DOCTYPE article SYSTEM "loose.dtd"> and <?xml version="1.0" encoding="UTF-8"?> . how can I modification in this tag.. because i don't know how to this read and change this tag in xml::Twig...

my input:

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE art SYSTEM "loose.dtd">
<art>
<fr>
<p>Text</p>
<p>Text</p>
</fr>
<fr>
<p>Text</p>
<p>Text</p>
</fr>
</art>

I need output as:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<DTD>
<Contents type="&lt;!DOCTYPE article SYSTEM &quot;loose.dtd&gt;"/>
</DTD>
<art>
<fr>
<p>Text</p>
<p>Text</p>
</fr>
<fr>
<p>Text</p>
<p>Text</p>
</fr>
</art>

how can alter <?xml ?> and <!Doctype> tag, can you any one help this process..


Solution

  • You can try the following (code it's commented). The important point to understand it is to create a new twig, copy all the elements you want to keep and create what it changes:

    #!/usr/bin/env perl
    
    use warnings;
    use strict;
    use XML::Twig;
    
    ## Create a twig based in an input xml file.
    my $twig = XML::Twig->new;
    $twig->parsefile(shift);
    
    ## Create a new twig that will be the output.
    my $new_twig = XML::Twig->new( pretty_print => 'indented' );
    
    ## Create a root tag.
    $new_twig->set_root( XML::Twig::Elt->new( 'root' ) );
    
    ## Create the xml processing instruction.
    my $e = XML::Twig::Elt->new( 'k' => 'v' );
    $e->set_pi( 'xml', 'version="1.0" encoding="UTF-8" standalone="yes"' );
    $e->move( before => $new_twig->root );
    
    ## Copy the whole tree from the old twig.
    my $r = $twig->root;
    $r->paste( first_child => $new_twig->root );
    
    ## Copy the doctype from the old twig to the new one.
    my $contents_elt = XML::Twig::Elt->new( Contents  => { type => $twig->doctype } );
    my $dtd_elt = XML::Twig::Elt->new( DTD => '#EMPTY' );
    $contents_elt->move( last_child => $dtd_elt );
    $dtd_elt->move( first_child => $new_twig->root );
    
    ## Print the whole twig created.
    $new_twig->print;
    

    Run it like:

    perl script.pl xmlfile
    

    That yields:

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?><root>
      <DTD>
        <Contents type="&lt;!DOCTYPE art SYSTEM &quot;loose.dtd&quot;>&#x0a;"/>
      </DTD>
      <art>
        <fr>
          <p>Text</p>
          <p>Text</p>
        </fr>
        <fr>
          <p>Text</p>
          <p>Text</p>
        </fr>
      </art>
    </root>