Search code examples
phpjavascriptxmlxmldocument

The Curse of XML and PHP Whitespace


I am having an issue with DOMDocument and whitespace. Currently I have two types of XML files. One file was created manually about a year ago, I will call this file A. The second file, file B, is being generated using a PHP DOMDocument. I have been trying very hard (unsuccessfully) to make the whitespace in file A match file B.

Here's how it works... The user is given an option to add new <Slide> elements to the XML file. After new slides have been added the user has the option to add new <Items> to the XML file as a child of the <Slide> element.

When I add a <Slide> element to file B it works like a charm. I can even add a new <Item> element with zero problem. However, when I try to access the new <Identifier> element I just added in file B using the second PHP script below with $order != 'remove' I miss the node by one and select <Information/> instead.

It appears that manually created file A has white space that is not present in my generated file B. I have experimented with the preserveWhitespace property but it did not help.

Are there any suggestions on how I can correct this problem. Constructive criticism is also welcome as this is my first shot at dynamic XML manipulation. I apologize for the length and appreciate your time!!

File A - Created Manually - I am trying to match this file!

<?xml version="1.0" encoding="UTF-8"?>
<root>
<Areas>Head &amp; Neck</Areas>
<Area>Head &amp; Neck</Area>
<Type>Angiograph</Type>
<Slide>Ag-01a
    <Title>Catheter Angiography</Title>
    <Item1>
        <Identifier interestCoord=".51,.73" locator="point" labelBool="true" labelTxt="" leaderBool="true">Aortic Arch
        </Identifier>
        <Information/>
        <Question A="" B="" C="" D="" E="" Answer=""/>
    </Item1>

             .... More Items 

File B - Before user adds <Slide>. This portion is created Manually. A template if you will. After the user enters slide names new slides are generated using the chunk of code below.

<?xml version="1.0" encoding="UTF-8"?>
<root>
<Areas>Head &amp; Neck</Areas>
<Area>Head &amp; Neck</Area>
<Type>Brain Sections</Type>
</root>

File B - After users adds new <Slide> and <Item>. Formatting shown represents formatting created by DOMDocument. I think this is where the error is occuring! Whitespace!!!

<Slide>Ag-09a
    <Title>Catheter Angiography</Title>
<Item1><Identifier locator="point" interestCoord="0.143,0.65" labelBool="true" labelTxt="" leaderBool="false">Orbit</Identifier><Information/><Question A="" B="" C="" D="" E="" Answer=""/></Item1></Slide>

PHP script used to add new <Slide> elements to XML

<?php
session_start();

//Constants
$SECTION_SEP = "========================================================================</br>";

//Variables used to construct file path
$area = trim($_POST['area']);
$slideType = trim($_POST['slideType']);

$rawSlides = trim($_POST['theseSlides']);
$newSlideList = explode(",", $rawSlides);

$fileLocation = "../XML/".$area."/".$slideType."/".$area.".XML";

$dom = new DOMDocument();
echo('New DOMDocument created!</br>');

$dom->load($fileLocation);
echo('XML file loaded!</br>');

/*$dom->preserveWhiteSpace = false;
echo('White space removed!</br>');*/

$dom->documentElement;
echo('DOM initialized!</br>');

if ($dom->getElementsByTagName('Slide')->length == 0){  //New file with no slides

foreach ($newSlideList as $slide){
    $newSlide = $dom->createElement('Slide', $slide);
    $newTitle = $dom->createElement('Title', 'Scan');

    //Add the title element to the Item
    $newSlide->appendChild($newTitle);

    $dom->childNodes->item(0)->appendChild($newSlide);
    echo($slide." has been added to the list!</br>");
 }

} else {
$locators = $dom->getElementsByTagName('Slide');

}

if($dom->save($fileLocation)){
echo("File saved successfully!!");

}else echo("There was a problem saving the file!");

PHP script used to add/edit/remove <Item> and <Identifier> nodes depending on value of $orders == WARNING! Lengthy :/

<?php
session_start();

//Constants
$SECTION_SEP = "========================================================================</br>";

//Variables used to construct file path
$area = trim($_POST['area']);
$slideType = trim($_POST['slideType']);

$fileLocation = "../XML/".$area."/".$slideType."/".$area.".XML";

//echo("File location:".$fileLocation);

//Current data (c_ for current)
$c_poi = "";
$c_type = "";
$c_lblBool = "";
$c_lblOverride = "";
$c_leaderBool = "";

//Determine if this visit is for new or old data
$orders = trim($_POST['orders']);

//Variables used to replace information in XML file loaded below (n_ for new)
$n_slideName = trim($_POST['slideName']); //slide name in view format ie Ag-01a
$n_identName = trim($_POST['ident']); //contains multiple information separated by comma ie 0,Aortic Arch
$n_type = trim($_POST['type']); //locator type
$n_poi = trim($_POST['poi']);
$n_lblBool = trim($_POST['lblBool']);
$n_lblOverride = trim($_POST['lblOverride']);

echo("Modified: ".date('c')."</br>");
$dom = new DOMDocument();
echo('New DOMDocument created!</br>');

$dom->load($fileLocation);
echo('XML file loaded!</br>');

/*$dom->preserveWhiteSpace = false;
echo('White space removed!</br>');*/

$dom->documentElement;
echo('DOM initialized!</br>');

$locators = $dom->getElementsByTagName('Slide');
echo($locators->length.' elements retrieved</br>');

$slideEntryFound = false;
$identEntryFound = false;
$identAttributesFound = false;
echo($SECTION_SEP);

//Locate the correct slide node
foreach ($locators as $locator){

//If there is a match, store the infomation
// rawSlide[x].childNode[0].nodeValue
if(strcmp(trim($locator->childNodes->item(0)->nodeValue),$n_slideName) == 0){

    $slideEntryFound = true;
    $slideChildren = $locator->childNodes;

    //Locate the correct identifier node
    foreach($slideChildren as $child){

        if( strcmp(trim($child->nodeValue), substr($n_identName,strpos($n_identName,",")+1)) == 0){
            $identEntryFound = true;
            if (strcmp($orders, "remove") == 0){//Removing an element


                echo("The identifier being removed is: ".trim($child->nodeValue."</br>"));
                echo("The node path is: ".($child->childNodes->item(1)->getNodePath())."</br>");
                echo($SECTION_SEP);

                $locator->removeChild($child);
                echo("Identifier successfully removed!</br>");
                echo($SECTION_SEP);
                break;

            } else {//Not removing anything - Adding or Editing

                echo("The identifier being modified is: ".trim($child->nodeValue."</br>"));
                echo("The node path is: ".($child->childNodes->item(1)->getNodePath())."</br>");
                echo($SECTION_SEP);

                if($child->childNodes->item(1)->hasAttributes()){

                    $identAttributesFound = true;

                    $c_poi = $child->childNodes->item(1)->getAttribute('interestCoord');
                     echo("--Current interestCoord: ".$c_poi."</br>");
                     echo("++New interestCoord: ".$n_poi."</br>");
                    if(strcmp($c_poi, $n_poi) != 0){
                       $child->childNodes->item(1)->setAttribute('interestCoord',$n_poi);
                    }

                    $c_type = $child->childNodes->item(1)->getAttribute('locator');
                     echo("--Current locator: ".$c_type."</br>");
                     echo("++New locator: ".$n_type."</br>");

                    $c_lblBool = $child->childNodes->item(1)->getAttribute('labelBool');
                     echo("--Current labelBool: ".$c_lblBool."</br>");
                     //echo("++New labelBool: ".$n_lblBool."</br>");

                    $c_lblOverride = $child->childNodes->item(1)->getAttribute('labelTxt');
                     echo("--Current labelOverride: ".$c_lblOverride."</br>");
                     echo("++New labelOverride: ".$n_lblOverride."</br>");

                    $c_leaderBool = $child->childNodes->item(1)->getAttribute('leaderBool');
                     echo("--Current leaderBool: ".$c_leaderBool."</br>");
                     //echo("++New leaderBool: ".$n_leaderBool."</br>");

                    if($n_lblOverride != ""){
                        echo("**A new label override was detected. The identifier will have the alias ".$n_lblOverride.".");
                    }
               break;
            } else echo("Fatal Error - Node does not contain attributes!</br>");

            if($identEntryFound == true && $identAttributesFound == false)
                echo("Error - Attribute entry not found!");
            break;
            }
        }

    }

    if($slideEntryFound == true && $identEntryFound == false && $orders != "remove"){

        echo("The identifier was not found... creating a new identifier!</br>");

     //Create a new Element

        $newElement = $dom->createElement("Item".((integer)(substr($n_identName,0,strpos($n_identName,",")))+1));

        echo("New element created!!</br>");

        //Create new Item children
        $newSubElem = $dom->createElement("Identifier", substr($n_identName,strpos($n_identName,",")+1));
        $newSubElem->setAttribute('locator',$n_type);
        $newSubElem ->setAttribute('interestCoord',$n_poi);
        $newSubElem->setAttribute('labelBool', $n_lblBool);
        $newSubElem->setAttribute('labelTxt', $n_lblOverride);
        //TODO link this next one to a variable instead of hard coding
        $newSubElem->setAttribute('leaderBool', "false");

        //Info Child
        $newInfoElem = $dom->createElement("Information");
        //Question Child
        $newQuestion = $dom->createElement("Question");
            $newQuestion->setAttribute('A', "");
            $newQuestion->setAttribute('B', "");
            $newQuestion->setAttribute('C', "");
            $newQuestion->setAttribute('D', "");
            $newQuestion->setAttribute('E', "");
            $newQuestion->setAttribute('Answer', "");

        //Add new children to main Item
        $newElement->appendChild($newSubElem);
        $newElement->appendChild($newInfoElem);
        $newElement->appendChild($newQuestion);

        $locator->appendChild($newElement);


        echo("New identifier added!!</br>");

    break;
    }
} else {

}
}
if($slideEntryFound == false)
echo("Error - Slide entry not found!");

if($dom->save($fileLocation)){
echo("File saved successfully!!");

echo('<div id="phpHandleBtns>"></br><form><button type="submit" id="continueEdit" formaction="../edit.php">Continue Editing</button>'.
    '</br><button type="submit" id="doneEdit" formaction="../main.php">Done Editing</button></form></div>');

}else echo("There was a problem saving the file!");

?>

Solution

  • I would strongly recommend that you use an XPath API like this http://php.net/manual/en/class.domxpath.php to find the nodes you are interested in. Attempting to use the DOM API directly is only going to cause you heartache.

    More specifically, I think that your call to childNode() is getting tripped up by white space, but if you used childElement() instead (not sure if that exists, but with XPath it is easy), it would just ignore any whitespace.