I try to parse a XML-document (content.xml of a odt-file).
$reader = new XMLReader();
if (!$reader->open("content.xml")) die("Failed to open 'content.xml'");
// step through text:h and text:p elements to put them into an array
while ($reader->read()){
if ($reader->nodeType == XMLREADER::ELEMENT && ($reader->name === 'text:h' || $reader->name === 'text:p')) {
echo $reader->expand()->textContent; // Put the text into array in correct order...
}
}
$reader->close();
First of all I need just a little hint how to step correctly through the elements of the XML-file. In my attempt I can step through the text:h-elements, but how do I get the other elements (text:p), without messing up everything...
Nevertheless I'll show you my final target at all. Please don't think that I'm asking for a complete solution. I just wrote everything down to show which structure I need. I want to solve this problem step by step
The content of this xml-file is something like:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
[...]
<office:body>
<office:text text:use-soft-page-breaks="true">
<text:h text:style-name="P1" text:outline-level="2">Chapter 1</text:h>
<text:p text:style-name="Standard">Lorem ipsum. </text:p>
<text:h text:style-name="Heading3" text:outline-level="3">Subtitle 1</text:h>
<text:p text:style-name="Standard"><text:span text:style-name="T2">Something 1:</text:span> Lorem.</text:p>
<text:p text:style-name="Standard"><text:span text:style-name="T3">Something 2:</text:span><text:s/>Lorem ipsum.</text:p>
<text:p text:style-name="Standard"><text:span text:style-name="T4">Something 3:</text:span> Lorem ipsum.</text:p>
<text:h text:style-name="Heading3" text:outline-level="3">Subtitle 2</text:h>
<text:p text:style-name="Standard"><text:span text:style-name="T5">10</text:span><text:span text:style-name="T6">:</text:span><text:s/>Text (100%)</text:p>
<text:p text:style-name="Explanation">Further informations.</text:p>
<text:p text:style-name="Standard">9.7:<text:s/>Text (97%)</text:p>
<text:p text:style-name="Explanation">Further informations.</text:p>
<text:p text:style-name="Standard"><text:span text:style-name="T9">9.1:</text:span><text:s/>Text (91%)</text:p>
<text:p text:style-name="Explanation">Further informations.</text:p>
<text:p text:style-name="Explanation">More furter informations.</text:p>
[Subtitle 3 and 4]
<text:h text:style-name="Heading3" text:outline-level="3">Subtitle 5</text:h>
<text:p text:style-name="Standard"><text:span text:style-name="T5">10</text:span><text:span text:style-name="T6">:</text:span><text:s/>Text (100%)</text:p>
<text:p text:style-name="Explanation">Further informations.</text:p>
<text:p text:style-name="Standard">9.7:<text:s/>Text (97%)</text:p>
<text:p text:style-name="Explanation">Further informations.</text:p>
<text:p text:style-name="Standard"><text:span text:style-name="T9">9.1:</text:span><text:s/>Text (91%)</text:p>
<text:p text:style-name="Explanation">Further informations.</text:p>
<text:p text:style-name="Explanation">More furter informations.</text:p>
<text:h text:style-name="Heading3" text:outline-level="3">References</text:h>
<text:list text:style-name="LFO44" text:continue-numbering="true">
<text:list-item><text:p text:style-name="P25">blabla et al., Any Title p. 580-586</text:p></text:list-item>
<text:list-item><text:p text:style-name="P25">blabla et al., Any Title p. 580-586</text:p></text:list-item>
<text:list-item><text:p text:style-name="P25">blabla et al., Any Title p. 580-586</text:p></text:list-item>
<text:list-item><text:p text:style-name="P25">blabla et al., Any Title p. 580-586</text:p></text:list-item>
</text:list>
[Multiple Chapter like this]
</office:text>
</office:body>
You see, that the "subchapters" always have standard-elements and an optional explanation-element (also multiple explanation-elements for one standard are possible). This structure is always the same...
My final target is to split all the informations to get an Array-Output like this:
array() {
[1]=>
array() {
["chapter"]=>
string() "Chapter 1"
["content"]=>
array() {
[0]=>
array() {
["subchapter"]=>
string() "Description"
["content"]=>
array() {
[0]=>
array() {
["standard"]=>
string() "Lorem ipsum."
["explanation"]=>
string(0) ""
}
}
}
[1]=>
array() {
["subchapter"]=>
string() "Subtitle 1"
["content"]=>
array() {
[0]=>
array() {
["standard"]=>
string() "Something 1: Lorem."
["explanation"]=>
string() ""
}
[1]=>
array() {
["standard"]=>
string() "Something 2: Lorem ipsum."
["explanation"]=>
string() ""
}
[2]=>
array() {
["standard"]=>
string() "Something 2: Lorem ipsum."
["explanation"]=>
string() ""
}
}
}
[2]=>
array() {
["subchapter"]=>
string() "Subtitle 2"
["content"]=>
array() {
[0]=>
array() {
["standard"]=>
string() "10: Text (100%)"
["explanation"]=>
string() "Further informations."
}
[and so on]
edit:
I can see your issue now, thanks for editing the question:
in your while loop
while ($reader->read()){
}
You have a couple of functions available to get the nodes and values:
$reader->value
will give the value (eg 'Subtitle 1')
$reader->getAttribute('text:style-name')
Should get the 'Heading3' part
Putting it altogether, you probably want something like this inside the while loop [pseudocode]:
// set an index
$i = 0;
// get the parts fromt he xml we need
$name = $reader->name;
$attrib = $reader->getAttribute('text:style-name');
$value = $reader->value;
// if the attribute is a 'P1', then increment our index, as we need a new indentation in our array
if($value == 'P1'){
$i++;
}
$array[$i][$attrib]=$reader->value;
note that this will only do the indentation to one level - it looks like you need 4 levels, so you should probably have 4 indexes [$i,$k,$k,$l] and check each one against each thing that needs indented - P1,Heading3, etc
you might end up with
$array[$i][$j][$k] = $reader->value;
or the like. Remember to re-set all your sub-indexes when you incrment a higher index (eg if you $i++, set $j=0, $k=0, etc)
previous answers below:
SimpleXML could (probably) do this in a few lines [if the structure of the xml file is already nested the correct way, which, after a quick look, it appears to be]: http://php.net/manual/en/book.simplexml.php
$xml = simplexml_load_file('content.xml');
$json = json_encode($xml);
$array = json_decode($json,TRUE);
print_r($array);
edit: you can also use xpath with simplexml, and you can do things like
echo $xml->{office:body}->{office:text}->{text.h}