Search code examples
phpxmlxml-parsingsimplexmldtd

Parsing XML: Pulling a separate value based on IDREF/ID


I've been struggling over this all day and in reality it's probably really simple... but I'm a complete beginner to the world of PHP and XML so could really do with some help.

I'm using SimpleXML to parse my data and have two second-level groups - (yearlist) and (eplist). I have (year) nested inside (yearlist) which has an attribute "yid", set as ID in my DTD. It also has (yearname) nested inside (year) which contains a more detailed description to be displayed as output. I have (ep) nested inside (eplist), with the attribute "yearid" (which correlates directly to "yid"), set as IDREF in my DTD.

Basically, when I'm parsing the data for (eplist), I want to use (yearname) as a group header - using yearid=yid>yearname as the path.

I've created an example of my data which may help explain my problem better.

Here is my DTD:

<?xml encoding="UTF-8"?>

<!ELEMENT besteplist (yearlist,eplist)>

<!ELEMENT yearlist (year)+>
<!ELEMENT year (yearname)>
<!ATTLIST year
            yid ID #REQUIRED>
<!ELEMENT yearname (#PCDATA)>

<!ELEMENT eplist (ep)+>
<!ELEMENT ep (eptitle,eptnumber)>
<!ATTLIST ep
            eid ID #REQUIRED
            yearid IDREF #IMPLIED>
<!ELEMENT eptitle (#PCDATA)>
<!ELEMENT eptnumber (#PCDATA)>

Here is my XML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE besteplist SYSTEM "example.dtd">
<besteplist>
    <yearlist>
        <year yid="y1">
            <yearname>1995, Season 1</yearname>
        </year>
        <year yid="y2">
            <yearname>1996, Season 2</yearname>
        </year>
        <year yid="y3">
            <yearname>1997, Season 3</yearname>
        </year>
    </yearlist>
    <eplist>
        <ep yearid="y1" eid="e1">
            <eptitle>The First Episode</eptitle>
            <eptnumber>1</eptnumber>
        </ep>
        <ep yearid="y2" eid="e2">
            <eptitle>Bla bla bla</eptitle>
            <eptnumber>21</eptnumber>
        </ep>
        <ep yearid="y2" eid="e3">
            <eptitle>Rar rar rar</eptitle>
            <eptnumber>39</eptnumber>
        </ep>
        <ep yearid="y2" eid="e4">
            <eptitle>Tra la la</eptitle>
            <eptnumber>45</eptnumber>
        </ep>
        <ep yearid="y3" eid="e5">
            <eptitle>Donkey</eptitle>
            <eptnumber>126</eptnumber>
        </ep>
    </eplist>
</besteplist>

Here is an example of how I'd like the output to look:

SEASON: 1995, Season 1

    EPISODE TITLE: The First Episode
    EPISODE NUMBER: 1

SEASON: 1996, Season 2

    EPISODE TITLE: Bla bla bla
    EPISODE NUMBER: 21

    EPISODE TITLE: Rar rar rar
    EPISODE NUMBER: 39

    EPISODE TITLE: Tra la la
    EPISODE NUMBER: 45

SEASON: 1997, Season 3

    EPISODE TITLE: Donkey
    EPISODE NUMBER: 126

I don't think it'll be much use posting the code I've attempted already as it's probably fairly useless... what I have managed to do is the very basics. Once I've got this down I can move on to the next stage... formatting...

I'm not attached to SimpleXML in any way so if somebody can suggest a more efficient way of doing things, I'm all ears.

Thank you so much in advance to anybody who takes the time to help me out. :)

Sam


In response to @michi, I've been sat trying to work out xpath and reading all sorts of syntax/tutorials online and can't seem to get my head around it. This is what I have so far... but I've commented out the xpath as it's obviously wrong.

<?php
$xml=simplexml_load_file("example.xml") or die("Error: Cannot create object");

foreach($xml->yearlist->children() as $years) { 
    $xyid=$years[yid];
    echo "_____________________________________________<br>";
    echo "(yid= " . $xyid . " )<br>";
    echo "SEASON: " . $years->yearname . "<br>"; 
    echo "_____________________________________________<br>";
    foreach($xml->eplist->children() as $episodes) { 
    echo "EPISODE TITLE: " . $episodes->eptitle . "<br>"; 
    echo "EPISODE NUMBER: " . $episodes->eptnumber . "<br>"; 
    $xyearid=$episodes[yearid];
    echo "(yearid= " . $xyearid . " )<br>";
    // echo $xml->xpath('//year[@yid="$episodes[yearid]"]/yearname');
    echo "</p>"; 
    } 
}

?>

I hope you can guide me in the right direction!

Thanks Sam


Thanks for the help michi - that's definitely a step in the right direction!

I'm trying to think of ways to only display the season name once... came across iterations and arrays but they all look too complicated for me. Is it possible to include xpath within a foreach command? I thought perhaps if I nested foreach episodes within foreach seasons and used xpath to match the ID it could work, but I can't seem to get it to show the elements. Am I on the right track?

<?php
$xml=simplexml_load_file("example.xml") or die("Error: Cannot create object");

foreach ($xml->yearlist->year as $season) {
    echo "SEASON: " . $season->yearname . PHP_EOL;
    foreach ($xml->xpath("//ep[@yearid='$season[yid]']")[0] as $episode) { 
        echo "EPISODE TITLE: " . $episode->eptitle . PHP_EOL;
        echo "EPISODE NUMBER: " . $episode->eptnumber . PHP_EOL; 
        echo PHP_EOL;
    }
}

?>

Thanks again!


Solution

  • You mastered the basic SimpleXml techniques, good job. Now let's work on it:

    1. I suggest to iterate over <eplist> and echo all <ep> only:

      $xml = simplexml_load_string($x); // assume XML in $x
      
      foreach ($xml->eplist->ep as $episode) { 
          echo $episode['yearid'] . PHP_EOL;
          echo "EPISODE TITLE: " . $episode->eptitle . PHP_EOL;
          echo "EPISODE NUMBER: " . $episode->eptnumber . PHP_EOL; 
          echo PHP_EOL;
      }
      

      PHP_EOL generates a new line across different platforms, see When do I use the PHP constant "PHP_EOL"?

      see it in action: https://eval.in/464970

      This does look similar to what you want, doesn't it?

    2. Use the <ep> yearid attribute as a key to access and echo the corresponding <yearname>, use xpath() for it.

      Your xpathexpression is basically correct, but needs some changes:

      // old:
      echo $xml->xpath('//year[@yid="$episode[yearid]"]/yearname');
      
      // new:
      echo $xml->xpath("//year[@yid='$episode[yearid]']/yearname")[0];
      

      Swap " and ' so $episode will be evaluated. Note that I changed its name from $episodes to $episode in my code.
      See What is the difference between single-quoted and double-quoted strings in PHP?

      xpath() returns an array of SimpleXml elements, to access the 1st value we need to dereference the array with [0].

      Of course, this code is not error-proof, it doesn't check if the array is empty etc. you need to add this for production, but it would complicate the point in these examples.

      Replace echo $episode['yearid'] (...) with the correct xpath.

      see it working: https://eval.in/464992

    3. up next: grouping episodes with the same SEASON = echo SEASON only for the 1st episode belonging to that season. (your job)

      Update:

      You posted almost perfect code, see my comment.

      Basically, you have two tables linked by yearid. 1 episode is linked to 1 year, and 1 year is linked to many episodes. You can go about it either by iterating years and select the linked episodes (= your last code example) or iterate over episodes and select the linked year (= my code examples).

      Here's a way to group building on the previous examples:

      $xml = simplexml_load_string($x); // assume XML in $x
      $yid = "";
      
      foreach ($xml->eplist->ep as $episode) { 
      
          // check if last yearid is different from current yearid
          // only if yes, echo the yearname 
          if ($yid != (string)$episode['yearid']) {
              echo "SEASON: " . $xml->xpath("//year[@yid='$episode[yearid]']/yearname")[0] . PHP_EOL . PHP_EOL;
          }
          echo "  EPISODE TITLE: " . $episode->eptitle . PHP_EOL;
          echo "  EPISODE NUMBER: " . $episode->eptnumber . PHP_EOL . PHP_EOL; 
      
          // store current yearid in $yid for next iteration
          $yid = (string)$episode['yearid'];
      }
      

      Note: (string) takes care that the evaluation is a string rather than a SimpleXml object.

      Output:

      SEASON: 1995, Season 1
      
        EPISODE TITLE: The First Episode
        EPISODE NUMBER: 1
      
      SEASON: 1996, Season 2
      
        EPISODE TITLE: Bla bla bla
        EPISODE NUMBER: 21
      
        EPISODE TITLE: Rar rar rar
        EPISODE NUMBER: 39
      
        EPISODE TITLE: Tra la la
        EPISODE NUMBER: 45
      
      SEASON: 1997, Season 3
      
        EPISODE TITLE: Donkey
        EPISODE NUMBER: 126
      

      see it working: https://eval.in/465044

      Further discussion: The code takes for granted that the <ep> nodes are already grouped in your XML. If you had a <ep> with y1 after y3...