Search code examples
phpxmlsimplexml

getting summarized nodes instead of separate nested nodes


I've a third party xml file like this. It's a movie app that organizes which scene is shot on which shooting day.

<schedule>

<DayBreak>
    <AutoText>true</AutoText>
    <Text></Text>
</DayBreak>

<Scene>
    81
</Scene>

<DayBreak>
    <AutoText>false</AutoText>
    <Text>myday</Text>
</DayBreak>

<Scene>
    82
</Scene>

<Scene>
    85
</Scene>

<schedule /> 

As you see, there is a new day, then scene 81 is shot, then comes another day, where scene 82 and scene 85 are shot.

(If you ask me, I wouldn't structure the xml like this, but that's what I got from the other guys.)

Now, if I parse this xml file using PHP's simplexml, I get these arrays:

                [DayBreak] => Array
                    (
                        [0] => SimpleXMLElement Object
                            (
                                [AutoText] => true
                                [Text] => SimpleXMLElement Object
                                    (
                                    )

                            )

                        [1] => SimpleXMLElement Object
                            (
                                [AutoText] => false
                                [Text] => myday
                            )
                        )

                [Scene] => Array
                    (
                        [0] => 81
                        [1] => 82
                        [2] => 85
                    )

                [EndShooting] => SimpleXMLElement Object
                    (
                    )

As you see, I can't parse anymore which scene is shot an which day as the array is summarized.

What should I do?

thanks, Matt


Solution

  • One way would be to loop through the schedule child nodes from top to bottom, fixing the XML along the way. When you stumble upon a DayBreak node, create a new Day container node at that spot and put the DayBreak into that Day node. Scene nodes that follow will also go into that node.

    This way you will have a better structure to work with.

    <?php
    $xml = <<<END
    <schedule>
    
    <DayBreak>
        <AutoText>true</AutoText>
        <Text></Text>
    </DayBreak>
    
    <Scene>
        81
    </Scene>
    
    <DayBreak>
        <AutoText>false</AutoText>
        <Text>myday</Text>
    </DayBreak>
    
    <Scene>
        82
    </Scene>
    
    <Scene>
        85
    </Scene>
    
    </schedule>
    END;
    // Note: Last line in XML edited (used to be "<schedule />")
    
    // Load
    $dom = new DOMDocument();
    $dom->loadXml($xml);
    
    // Grab schedule node
    $schedule = $dom->getElementsByTagName('schedule')->item(0);
    
    // Loop through the child nodes of schedule
    $i = 0;
    while ($i < $schedule->childNodes->length) {
        $childNode = $schedule->childNodes->item($i);
    
        switch ($childNode->nodeName) {
            case 'DayBreak':
                $dayBreak = $childNode;
                // Brand new day. Wrap DayBreak node in Day node, into which the
                // following scene nodes will be moved
                $day = $dom->createElement('Day');
                $schedule->insertBefore($day, $dayBreak);
                $day->appendChild($dayBreak);
                break;
    
            case 'Scene':
                $sceneNode = $childNode;
                // A scene shot the current day. Move the node into the day node.
                $day->appendChild($childNode);
                continue 2; // Don't increase $i as node was moved out of 'schedule'
        }
    
        $i++;
    }
    
    // echo $dom->saveXML(); // Uncomment to view XML
    var_dump(new SimpleXMLElement($dom->saveXML()));
    

    Output:

    object(SimpleXMLElement)#5 (1) {
      ["Day"]=>
      array(2) {
        [0]=>
        object(SimpleXMLElement)#6 (2) {
          ["DayBreak"]=>
          object(SimpleXMLElement)#9 (2) {
            ["AutoText"]=>
            string(4) "true"
            ["Text"]=>
            object(SimpleXMLElement)#10 (0) {
            }
          }
          ["Scene"]=>
          string(8) "
        81
    "
        }
        [1]=>
        object(SimpleXMLElement)#8 (2) {
          ["DayBreak"]=>
          object(SimpleXMLElement)#9 (2) {
            ["AutoText"]=>
            string(5) "false"
            ["Text"]=>
            string(5) "myday"
          }
          ["Scene"]=>
          array(2) {
            [0]=>
            string(8) "
        82
    "
            [1]=>
            string(8) "
        85
    "
          }
        }
      }
    }
    

    Edit

    You can then loop through the days and scenes using the DOMDocument:

    foreach ($dom->getElementsByTagName('Day') as $day) {
        echo "Day:";
        foreach ($day->getElementsByTagName('Scene') as $scene) {
            echo ' ' . trim($scene->nodeValue);
        }
        echo PHP_EOL;
    }
    

    or a SimpleXMLElement object, if you are more familiar with that:

    $xml = new SimpleXMLElement($dom->saveXML());
    
    foreach ($xml->Day as $day) {
        echo "Day:";
        foreach ($day->Scene as $scene) {
            echo ' ' . trim($scene);
        }
        echo PHP_EOL;
    }
    

    Output (for both DOMDocument and SimpleXMLElement examples):

    Day: 81
    Day: 82 85