Search code examples
perlikiwiki

Filehandles and XML::Simple -> Memory corruption. Can't isolate problem


In a small test file, I can run

#!/usr/bin/perl
use warnings;
use strict;
use open qw{:utf8 :std};
use XML::Simple;

my @cmdline = ("hg", "log", "-v", "--style", "xml");
open my $xml, "@cmdline |";

my $xmllog = XMLin($xml, ForceArray => ['logentry', 'parent', 'copy', 'path']);

foreach my $rev (@{$xmllog->{logentry}}) {
    #do stuff
}

and it works fine. When I run the same code in a larger program (with the same XML input), it terminates with

*** glibc detected *** /usr/bin/perl: malloc(): memory corruption: 0x0a40e308 ***

(full crash log @ pastebin.com)

However, if I do the exchange

#open my $xml, "@cmdline |";
my $xml = `@cmdline`;

then it works (in both files), so this is more a question of curiosity than a real problem for me.

  1. Does anyone have any pointers on what the difference between my test case and the larger code base might be?
  2. Is there a speed/memory/? difference in the different command calls? Best practices?

Debian Sid: Perl 5.12.4-1.

(This is my first Perl encounter, so don't assume too much about what I "should" know about the language. I just dove into existing code.)

(The larger program is ikiwiki, so the code is not a secret, but I don't know where to look for trouble, and I can't include all the code in this post for practical reasons. This concerns the Mercurial backend.)


As per suggestion from cjm, I added print "$_\n" for sort grep /XML/, keys %INC; which gave output

RPC/XML.pm
RPC/XML/Client.pm
RPC/XML/ParserFactory.pm
XML/NamespaceSupport.pm
XML/Parser.pm
XML/Parser/Expat.pm
XML/SAX.pm
XML/SAX/Base.pm
XML/SAX/Exception.pm
XML/SAX/Expat.pm
XML/SAX/ParserFactory.pm
XML/Simple.pm

in the large project, and

XML/NamespaceSupport.pm
XML/Parser.pm
XML/Parser/Expat.pm
XML/SAX.pm
XML/SAX/Base.pm
XML/SAX/Exception.pm
XML/SAX/Expat.pm
XML/SAX/ParserFactory.pm
XML/Simple.pm

in the test file.


Update: I installed the Debian package libxml-libxml-perl and added $XML::SAX::ParserPackage = "XML::LibXML::SAX"; as suggested. This also crashed, with a different message this time:

*** stack smashing detected ***: /usr/bin/perl terminated

full backtrace @ pastebin.com

This time it happened consistently in both the large and the small file, though. Also, only when using open, not when using backticks.

I also installed libxml-libxml-simple-perl, but that was not supposed to be more than in practice a wrapper to always use XML::LibXML as parser. It also behaved differently and complained about the options to XMLin() that was set, so I discarded it.

Trying to explicitly (and blindly) make the program use each of the alternatives given by print "$_\n" for sort grep /XML/, keys %INC; seems to point towards that XML::SAX::Expat is used by default as cjm said (since all other alternatives exit with errors, and XML::SAX:Expat behaves exactly like the original problem in both files. Explicitly demanding XML::Simple goes into a loop that allocates all my memory).

I'm thankful for learning about different XML parsers and that XML::Simple automatically chooses different ones. Both parts of my original question somewhat remain though:

  1. Why do the programs behave differently? Even if I explicitly set $XML::SAX::ParserPackage = "XML::SAX::Expat" in both programs, one crashes (using open) and the other works.
  2. Should I use another method to receive output from the external command? Is it even wrong to expect XMLin() ta work with open (but why does it work in one case, then?)?

Or are they simple the "wrong" questions to ask (i.e. irrelevant)?


UPDATE: More than a week has passed, not a flurry of activity here, and I solve it a bit differently now, without problems. I mark cjm's answer as correct, since it got me further in the error analysis. Thanks!


Solution

  • XML::Simple is pure-Perl, so it's unlikely to cause the memory corruption you report. It depends on a lower-level XML parser, and it's likely the bug you've encountered is in there. But there are multiple parsers it could be using, and we'd need to know which one.

    Try adding this line right after the XMLin line in your sample program, and update your question with the results:

    print "$_\n" for sort grep /XML/, keys %INC;
    

    This will tell us which XML parser you're actually using on your system.


    Update: Since it looks like you're using XML::Parser (through its SAX interface XML::SAX::Expat, I'd suggest trying XML::LibXML::SAX instead. Libxml2 is considered one of the better XML parsers.

    If you don't already have XML::LibXML::SAX installed, just installing it should switch your default SAX parser to it. If it is installed, try putting

    $XML::SAX::ParserPackage = "XML::LibXML::SAX";
    

    at the beginning of your program. (See XML::SAX::ParserFactory for how the SAX parser is selected.)