Search code examples
xmlperlxml-simple

Parsing XML data into a different structure


I have the below XML file and I am trying to parse it

<Books>
  <book name="first" feed="contentfeed" mode="modes" />
  <book name="first" feed="communityfeed" mode="modes" region="1"/>
  <book name="second" feed="contentfeed" mode="" />
  <book name="second" feed="communityfeed" mode="modes" />
  <book name="second" feed="articlefeed" mode="modes" /> 
</Books>

I am using Perl version 5.8 together with XML::Simple. Below is the code I have written

    use XML::Simple;

    my $xs = new XML::Simple( KeyAttr => { book => 'name' } , ForceArray => [ 'book','name' ] );
    my $config = $xs->XMLin( <complete path to xml file> );

Below is the result (displayed using Data::Dumper)

'book' => {
    'first' => {
        'feed'   => 'communityfeed',
        'mode'   => 'modes',
        'region' => '1'
    },
    'second' => {
        'feed' => 'articlefeed',
        'mode' => 'modes'
    },
}

Instead I would like to have the output in the format below

'book' => {
    'first' => {
        'communityfeed' => { mode => 'modes', region => '1' },
        'contentfeed'   => { mode => 'modes' }
    },
    'second' => {
        'communityfeed' => { mode => 'modes' },
        'contentfeed'   => { mode => '' },
        'articlefeed'   => { mode => 'modes' }
    },
}

Notes

  1. The XML file format cannot be changed as it is the current production version
  2. Perl version 5.8 is preferred as this is the version used in the parent script and parsing logic should be merged into that script

Have you encountered this kind of issue before? If so then how can this be tackled?


Solution

  • XML::Simple is an awkward and frustrating module to use, and I doubt very much if you can persuade it to build the data structure that you require. Almost any other XML parser would be a step forward

    Here's a solution using XML::Twig. You can interrogate the parsed XML data and build whatever data structure you like from it

    I've used Data::Dump only to display the resulting data

    use strict;
    use warnings 'all';
    
    use XML::Twig;
    
    my $config;
    
    {
        my $twig = XML::Twig->new;
        $twig->parsefile('books.xml');
    
        for my $book ( $twig->findnodes('/Books/book') ) {
    
            my $atts = $book->atts;
            my ( $name, $feed ) = delete @{$atts}{qw/ name feed /};
    
            $config->{book}{$name}{$feed} = $atts;
        }
    }
    
    use Data::Dump;
    dd $config;
    

    output

    {
      book => {
        first  => {
                    communityfeed => { mode => "modes", region => 1 },
                    contentfeed   => { mode => "modes" },
                  },
        second => {
                    articlefeed   => { mode => "modes" },
                    communityfeed => { mode => "modes" },
                    contentfeed   => { mode => "" },
                  },
      },
    }