Search code examples
phpxmlsimplexmlchildren

php/simplexml adding elements before and after text


I'm trying to insert elements into an xml document around some text. Part of the problem may be that this is not well-formed xml and it needs to be easier to read for a human as plain text. So what I have is something like this:

<record>
  <letter>
    <header>To Alice from Bob</header>
    <body>Hi, how is it going?</body>
  </letter>
</record>

I need to end up with this:

<record>
  <letter>
    <header>To <to>Alice</to> from <from>Bob</from></header>
    <body>Hi, how is it going?</body>
  </letter>
</record>

Something similar should be valid html:

<p>To <span>Alice</span> from <span>Bob</span></p>

I can set the value of the header to a string, but the <> are converted to &lt and &gt, which is no good. Right now I'm using $node->header->addChild('to', 'Alice') and $node[0]->header = 'plain text'.

If I do

$node->header->addChild('to', 'Alice'); 
$node->header = 'plain text';
$node->header->addChild('from', 'Bob'); 

Then I get

<header>plain text <from>Bob</from></header>

The 'to' is wiped out.

Quick and dirty ways are to just let it be

<header>plain text <to>Alice</to><from>Bob</from></header>

And then just open the file a second time and move the elements around. Or search and replace the &lt and &gt. That seems the wrong way though.

Is this possible with simpleXML?

Thank you!


Solution

  • From the viewpoint of DOM (and SimpleXML is an abstraction on top of that), you're do not insert elements around text. You replace a text nodes with a mix of text nodes and element nodes. SimpleXML has some problems with mixed child nodes, so you might want to use DOM directly. Here is a commented example:

    $xml = <<<'XML'
    <record>
      <letter>
        <header>To Alice from Bob</header>
        <body>Hi, how is it going?</body>
      </letter>
    </record>
    XML;
    
    // the words and the tags you would like to create
    $words = ['Alice' => 'to', 'Bob' => 'from'];
    // a split pattern, you could built this from the array
    $pattern = '((Alice|Bob))';
    
    // bootstrap the DOM
    $document = new DOMDocument();
    $document->loadXml($xml);
    $xpath = new DOMXpath($document);
    
    // iterate any text node with content
    foreach ($xpath->evaluate('//text()[normalize-space() != ""]') as $text) {
      // use the pattern to split the text into an list
      $parts = preg_split($pattern, $text->textContent, -1, PREG_SPLIT_DELIM_CAPTURE);
      // if it was split actually
      if (count($parts) > 1) {
        /// iterate the text parts
        foreach ($parts as $part) {
          // if it is a word from the list
          if (isset($words[$part])) {
            // add the new element node
            $wrap = $text->parentNode->insertBefore(
              $document->createElement($words[$part]),
              $text
            );
            // and add the text as a child node to it
            $wrap->appendChild($document->createTextNode($part));
          } else {
            // otherwise add the text as a new text node
            $text->parentNode->insertBefore(
              $document->createTextNode($part),
              $text
            );
          }
        }
        // remove the original text node
        $text->parentNode->removeChild($text);
      }
    }
    
    echo $document->saveXml();
    

    Output:

    <?xml version="1.0"?>
    <record>
      <letter>
        <header>To <to>Alice</to> from <from>Bob</from></header>
        <body>Hi, how is it going?</body>
      </letter>
    </record>