Search code examples
perlutf-8xqueryxml-rpcexist-db

Issues when adding a xml content with UTF-8 characters to an eXist-db collection using Perl


I am trying to add dynamically generated XML content to a eXist-db collection (see the code below addFile.pl) using Perl, the issue is that whenever the content contains UTF-8 characters I receive the error Failed to parse XML-RPC request: Byte "195" is not a member of the (7-bit) ASCII character set..

#!/usr/bin/perl
use RPC::XML;
use RPC::XML::Client;

my ($sec, $min, $hour, $mday, $mon, $year) = localtime();
my $timestamp = sprintf("%04d%02d%02d%02d%02d%02d",$year+1900,$mon+1,$mday,$hour,$min,$sec);
print("Timestamp: $timestamp\n");

my $FILENAME = "$timestamp.xml";
my $COLLECTION = 'output';

my $record = <<END;
<document id="doc_20150419014112">
  <text>ñáéíóú</text>
</document>
END

$query = <<END;
xquery version "3.0";
import module namespace xmldb="http://exist-db.org/xquery/xmldb";
declare variable \$filename := '$FILENAME';
declare variable \$record := '';

let \$log-in := xmldb:login("/db", "admin", "admin")
(: let \$create-collection := xmldb:create-collection("/db", "$COLLECTION") :)
let \$record := 
$record

for \$target in ('/db/$COLLECTION')
  return xmldb:store(\$target, \$filename, \$record)
END

print $query;

$URL = "http://admin:admin\@localhost:8080/exist/xmlrpc";
# connecting to $URL...
$client = new RPC::XML::Client $URL;
# Output options
$options = RPC::XML::struct->new(
    'indent' => 'yes', 
    'encoding' => 'UTF-8',
    'highlight-matches' => 'none');
$req = RPC::XML::request->new("query", $query, 20, 1, $options);
$response = $client->send_request($req);
if($response->is_fault) {
    die "An error occurred: " . $response->string . "\n";
}
my $result = $response->value;
print $result;

When I run the xquery script (see below) directly with eXide it runs normally but when I run it through the perl script I receive the following:

$ perl addFile.pl 

Timestamp: 20150428162016
xquery version "3.0";
import module namespace xmldb="http://exist-db.org/xquery/xmldb";
declare variable $filename := '20150428162016.xml';
declare variable $record := '';

let $log-in := xmldb:login("/db", "admin", "admin")
(: let $create-collection := xmldb:create-collection("/db", "output") :)
let $record := 
<document id="doc_20150419014112">
  <text>ñáéíóú</text>
</document>


for $target in ('/db/output')
  return xmldb:store($target, $filename, $record)
An error occurred: Failed to parse XML-RPC request: Byte "195" is not a member of the (7-bit) ASCII character set.

Solution

  • I found the solution here, I will quote the answer just in case:

    The RPC::XML Perl module uses us-ascii as XML encoding by default. If you delivering UTF-8 content from a database or other sources, RPC::XML produces invalid XML with the default setting.

    The XML encoding used by RPC::XML can only be changed globally:

    #!/usr/bin/perl
    use RPC::XML;
    use RPC::XML::Client;
    $RPC::XML::ENCODING = 'utf-8';