Search code examples
xmlperlwhitespacexml-simple

How do I compress XML from XML::Simple::XMLout?


I am using XML::Simple to parse and edit a very large XML file, and speed is essential (so far of all the method's I have tried XML::Simple has been the fastest)

Now once all my edits are completed I print the XML to a document using XMLout(), though it prints it with proper indentation which is nice if this was read by humans but is completely useless in my situation.

The output file without white space is 1.2 Mb with white space it is 15 Mb.

I have been using:

my $string = XMLout($data);
$string =~ s/>[\s]*</></g;
print $out $string;

But it seems to not only be an extreme CPU hog and takes an enormous amount of memory to do.

Is their a way to simply output my XML object as proper XML without all the useless white space?

Thanks


Solution

  • Look at NoIndent option: From XML::Simple manpage:

    NoIndent => 1 # out - seldom used

    Set this option to 1 to disable "XMLout()"’s default ’pretty printing’ mode. With this option enabled, the XML output will all be on one line (unless there are newlines in the data) - this may be easier for downstream processing.

    NormaliseSpace => 0 │ 1 │ 2 # in - handy

    This option controls how whitespace in text content is handled. Recognised values for the option are:

    • 0 = (default) whitespace is passed through unaltered (except of course for the normalisation of whitespace in attribute values which is mandated by the XML recommendation)

    • 1 = whitespace is normalised in any value used as a hash key (normalising means removing leading and trailing whites- pace and collapsing sequences of whitespace characters to a single space)

    • 2 = whitespace is normalised in all text content

      Note: you can spell this option with a ’z’ if that is more natural for you.