Search code examples
phpxml-parsingxml-namespacesdomxpathxliff

How to handle default namespaces when parsing XML files


My PHP page must parse input XML files (XLIFF, to be precise) but it does't work when a default namespace is present in the root element of the XML file.

My code assumes that a default namespace is required and that it must be urn:oasis:names:tc:xliff:document:1.2. If found in the XLIFF root element, it is fetched from there, otherwise it is added by my PHP code. I thought this was working but it seems it's not, and at the moment the only way I have to make it work is to remove the default namespace from the input XLIFF file. Of course, the PHP script should work regardless of whether the default namespace is present in the XLIFF file or not.

Under the understanding that a default namespace is necessary, in my PHP script I have:

$xml_file = file_get_contents($pathToInputFile);
if($xml_file === FALSE) {
    die("there is a problem to get contents from XLIFF file");
} 

$xliffObj = new DOMDocument();
$xliffObj->preserveWhiteSpace = true;
$xliffObj->loadXML($xml_file);

$context = $xliffObj->documentElement;
$xpath = new DOMXPath($xliffObj);

if (isSet($context->getAttributeNode('xmlns')->nodeValue)) {
    $ns = $context->getAttributeNode('xmlns')->nodeValue; 
    echo "The ns is: " . $ns;                          // line 198
}
else {
    $ns = "urn:oasis:names:tc:xliff:document:1.2";
    // this works when no default namespaces is defined in the XLIFF file
    echo "I have defined the ns as: " . $ns; 
}

$xpath->registerNamespace('ns', $ns);                 // line 208

$tus = $xpath->query('//trans-unit');
var_dump_pre($tus);die;

The parsing works fine if my input XLIFF file has:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xliff PUBLIC "-//XLIFF//DTD XLIFF//EN" "http://www.oasis-open.org/committees/xliff/documents/xliff.dtd">
<xliff xmlns:pisa="http://www.ets.org/pisa" version="1.2">

In that case, the output is

I have defined the ns as: urn:oasis:names:tc:xliff:document:1.2

object(DOMNodeList)#12 (1) { ["length"]=> int(2) }

The $tus array contains the two trans-unit nodes in the XLIFF file.

However, when the file has

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xliff PUBLIC "-//XLIFF//DTD XLIFF//EN" "http://www.oasis-open.org/committees/xliff/documents/xliff.dtd">
<xliff xmlns:pisa="http://www.ets.org/pisa" version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2">

then the nothing is extracted and the array where I save the contents of the file is empty (has NULL value). The output is:

The ns is: urn:oasis:names:tc:xliff:document:1.2

object(DOMNodeList)#10 (1) { ["length"]=> int(0) }

As you can see, the $tus array is empty.

A potential solution could be to simply remove the namespace declaration before adding it again, but I would like to understand what the problem is. Thanks.


Solution

  • It seems it is necessary to add the namespace to the xpath only when it is present in the XML file, thus:

    $xpath->registerNamespace('ns', $ns);
    $tus = $xpath->query('//ns:trans-unit');
    

    However, I'm not sure this could backfire in other situations...

    When it is not present, it seems it's not necessary to include it in the xpath expression:

    #$xpath->registerNamespace('ns', $ns);
    $tus = $xpath->query('//trans-unit');