Search code examples
phpxmlsimplexml

PHP SimpleXML: access node by attribute value


I have the following XML (this is just the header):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE tmx SYSTEM "tmx14.dtd">
<tmx version="1.4">
<header creationtool="TDC Analysis Package" creationtoolversion="org.gs4tr.tm3.tmx.Version" segtype="sentence" o-tmf="unknown" adminlang="EN-US" srclang="EN" datatype="unknown" creationdate="20221006T184234Z">
    <prop type="x-Recognizers">RecognizeAll</prop>
    <prop type="x-IncludesContextContent">True</prop>
    <prop type="x-TMName">sample_tmx.tmx</prop>
    <prop type="x-TokenizerFlags">DefaultFlags</prop>
    <prop type="x-WordCountFlags">DefaultFlags</prop>
</header>

I need to first check if the prop node whose type attribute is 'x-TMName' exists. (Some files have it, others don't.) If it exists, I need to update it. If it doesn't exist, I need to create it.

I parse these files using SimpleXML. At first, I was using this:

if(isset($xmlObj->header->prop[2])) { 
    $TM_name = $xmlObj->header->prop[2];
}

to access it and this to update it:

$xmlObj->header->prop[2] = $new_TM_name;

However, this is not reliable because there may be a different prop in the third position. So, after a lot of reading and experimenting, I managed to check if it exists and read it like this:

$fileName = "INVERTED_sample_tmx.tmx";

/* read TMX file contents */
$uploadedFile = "user_files/".$fileName;
$xmlStr = file_get_contents($uploadedFile);
$xmlObj = simplexml_load_string($xmlStr);

/* extract <header> <prop>s */
$props = $xmlObj->header->prop;
/* loop them to check if the type='x-TMName' attribute exists */
foreach($props as $prop) {
    $type = (string)$prop->attributes()->type;
    /* if exists, assign its value to the $TM_name variable */
        if($type == "x-TMName") {
            $TM_name = $prop;
            //$new_TM_name = "INVERTED ".$TM_name;
            //var_dump($prop);
        }
}

If the prop node whose type attribute is 'x-TMName' does not exist, i.e. if the $TM_name variable is not set, I create it:

if(isset($TM_name)) {
    $new_TM_name = "INVERTED ".$TM_name;
    echo "Old TM name is ".$TM_name." and new TM name is ".$new_TM_name;
}
else {
    /* use the fileName as the TM name */
    $new_TM_name = "INVERTED ".$fileName;
    if($new_prop = $xmlObj->header->addChild('prop', $new_TM_name)) {
        $new_prop->addAttribute('type', "x-TMName2");
        echo "new prop added";
    }
}

The problem with this approach is that I can't update the node with the $new_TM_name because I don't know how to access it. When I tried to update it within the foreach loop above using $prop = $new_TM_name, the page got stuck in an endless loop of printing. And I don't know how to access it outside of the foreach loop either. I've tried:

$headerProps = $xmlObj->header->prop['x-TMName']
or
$headerProps = $xmlObj->header->prop->{'x-TMName'}
or
$headerProps = (string)$xmlObj->header->prop->attributes()->type->{'x-TMName'};
var_dump($headerProps);

These all print empty. EDIT: Although my problem has now been resolved, I still don't understand why the above method(s) do not work to select the element I want, and I have to use a foreach loop. I found dozens of answers on here that offer something along those lines as an answer to questions similar to mine. Can somebody please explain what is wrong in them? END-OF-EDIT.

I am pretty new to working with XML and SimpleXML and I keep reading but I just can't seem to find the right way. Pease help.


Solution

  • As far as I can tell - the create a new object does as you would expect, so the main problem seems to be updating the value of the existing node if it exists.

    This version uses a 'feature' of SimpleXML, which allows you to set the value of an XML element whilst referencing the element itself.

    In the loop, the $prop will be a SimpleXMLElement object, so to set the value of this object (and not $prop as a variable) you can use $prop[0]...

    foreach($props as $prop) {
        /* if exists, assign its value to the $TM_name variable */
        if((string)$prop['type'] == "x-TMName") {
            $TM_name = (string)$prop;
            // Update the node value
            $prop[0] = "INVERTED ".$TM_name;
        }
    }
    

    You could use XPath to find the element rather than just using a loop. It looks a lot simpler - but then you would need to start learning XPath which is a whole new area to learn.