Search code examples
perlhashxml-parsingsubroutinexml-twig

Returning a hash of the Parsed document (using Twig in Perl) to be used for processing in other subs


I am failing terribly to return a Hash of the Parsed XML document using twig - in order to use it in OTHER subs for performing several validation checks. The goal is to do abstraction and create re-usable blocks of code.

XML Block:

<?xml version="1.0" encoding="utf-8"?>
<Accounts locale="en_US">
  <Account>
    <Id>abcd</Id>
    <OwnerLastName>asd</OwnerLastName>
    <OwnerFirstName>zxc</OwnerFirstName>
    <Locked>false</Locked>
    <Database>mail</Database>
    <Customer>mail</Customer>
    <CreationDate year="2011" month="8" month-name="fevrier" day-of-month="19" hour-of-day="15" minute="23" day-name="dimanche"/>
    <LastLoginDate year="2015" month="04" month-name="avril" day-of-month="22" hour-of-day="11" minute="13" day-name="macredi"/>
    <LoginsCount>10405</LoginsCount>
    <Locale>nl</Locale>
    <Country>NL</Country>
    <SubscriptionType>free</SubscriptionType>
    <ActiveSubscriptionType>free</ActiveSubscriptionType>
    <SubscriptionExpiration year="1980" month="1" month-name="janvier" day-of-month="1" hour-of-day="0" minute="0" day-name="jeudi"/>
    <SubscriptionMonthlyFee>0</SubscriptionMonthlyFee>
    <PaymentMode>Undefined</PaymentMode>
    <Provision>0</Provision>
    <InternalMail>[email protected]</InternalMail>
    <ExternalMail>[email protected]</ExternalMail>
    <GroupMemberships>
      <Group>werkgroep X.Y.Z.</Group>
    </GroupMemberships>
    <SynchroCount>6</SynchroCount>
    <LastSynchroDate year="2003" month="12" month-name="decembre" day-of-month="5" hour-of-day="12" minute="48" day-name="mardi"/>
    <HasActiveSync>false</HasActiveSync>
    <Company/>
  </Account>
  <Account>
    <Id>mnbv</Id>
    <OwnerLastName>cvbb</OwnerLastName>
    <OwnerFirstName>bvcc</OwnerFirstName>
    <Locked>true</Locked>
    <Database>mail</Database>
    <Customer>mail</Customer>
    <CreationDate year="2012" month="10" month-name="octobre" day-of-month="10" hour-of-day="10" minute="18" day-name="jeudi"/>
    <LastLoginDate/>
    <LoginsCount>0</LoginsCount>
    <Locale>fr</Locale>
    <Country>BE</Country>
    <SubscriptionType>free</SubscriptionType>
    <ActiveSubscriptionType>free</ActiveSubscriptionType>
    <SubscriptionExpiration year="1970" month="1" month-name="janvier" day-of-month="1" hour-of-day="1" minute="0" day-name="jeudi"/>
    <SubscriptionMonthlyFee>0</SubscriptionMonthlyFee>
    <PaymentMode>Undefined</PaymentMode>
    <Provision>0</Provision>
    <InternalMail/>
    <ExternalMail>[email protected]</ExternalMail>
    <GroupMemberships/>
    <SynchroCount>0</SynchroCount>
    <LastSynchroDate year="1970" month="1" month-name="janvier" day-of-month="1" hour-of-day="1" minute="0" day-name="jeudi"/>
    <HasActiveSync>false</HasActiveSync>
    <Company/>
  </Account>
</Accounts>

Perl Block:

my $file = shift || (print "NOTE: \tYou didn't provide the name of the file to be checked.\n" and exit);
my $twig = XML::Twig -> new ( twig_roots => { 'Account' => \& parsing } ); #'twig_roots' mode builds only the required sub-trees from the document while ignoring everything outside that twig.
$twig -> parsefile ($file);

sub parsing {
    my ( $twig, $accounts ) = @_;
    my %hash = @_;
    my $ref = \%hash; #because was getting an error of Odd number of hash elements
    return $ref;
    $twig -> purge;

It gives a hash reference - which I'm unable to deference properly (even after doing thousands of attempts).

Again - just need a single clean function (sub) for doing the Parsing and returning the hash of all elements ('Accounts' in this case) - to be used in other other function (valid_sub) for performing the validation checks.

I'm literally stuck at this point - and will HIGHLY appreciate your HELP.


Solution

  • Such a hash is not created by Twig, you have to create it yourself.

    Beware: Commands after return will never be reached.

    #!/usr/bin/perl
    use warnings;
    use strict;
    
    use XML::Twig;
    use Data::Dumper;
    
    my $twig = 'XML::Twig'->new(twig_roots => { Account => \&account });
    $twig->parsefile(shift);
    
    sub account {
        my ($twig, $account) = @_;
        my %hash;
        for my $ch ($account->children) {
            if (my $text = $ch->text) {
                $hash{ $ch->name } = $text;
            } else {
                for my $attr (keys %{ $ch->atts }) {
                    $hash{ $ch->name }{$attr} = $ch->atts->{$attr};
                }
            }
        }
        print Dumper \%hash;
        $twig -> purge;
        validate(\%hash);
    }
    

    Handling of nested elements (e.g. GroupMemberships) left as an exercise to the reader.

    And for validation:

    sub validate {
        my $account = shift;
        if ('abcd' eq $account->{Id}) {
            ...
        }
    }