Search code examples
phphtmlclassobjectsimpledom

Simple_DOM question about finding classes


I am trying to do a simple extraction, but I keep ending up with unpredictable results.

I have this HTML code

<div class="thread" style="margin-bottom:25px;"> 

<div class="message"> 

<span class="profile">Suzy Creamcheese</span> 

<span class="time">December 22, 2010 at 11:10 pm</span> 

<div class="msgbody"> 

<div class="subject">New digs</div> 

Hello thank you for trying our soap. <BR>  Jim.

</div> 
</div> 


<div class="message reply"> 

<span class="profile">Lars Jörgenmeier</span> 

<span class="time">December 22, 2010 at 11:45 pm</span> 

<div class="msgbody"> 

I never sold you any soap.

</div> 

</div> 

</div> 

And I am trying to extract the outertext from "msgbody" but only when the "profile" is equal to something. Like so.

$contents  = $html->find('.msgbody');
$elements = $html->find('.profile'); 

           $length = sizeof($contents);

           while($x != sizeof($elements)) {

            $var = $elements[$x]->outertext;

                        //If profile = the right name
            if ($var = $name) {

                                    $text = $contents[$x]->outertext;
                echo $text;

            }



            $x++;
         }    

I get text from the wrong profiles, not the ones with the associations I need. Is there a way to just pull the desired info with one line of code?

Like if span-profile = "correct name" then pull its div-msgbody


Solution

  • Okay I'm going to go with DOMXpath on this one. I'm not sure what 'outer text' is supposed to mean, but I'll go with this requirement:

    Like if span-profile = "correct name" then pull its div-msgbody

    First off, Here's the minified HTML test case I used:

    <html>
    <body>
    <div class="thread" style="margin-bottom:25px;"> 
    
    <div class="message"> 
    
    <span class="profile">Suzy Creamcheese</span> 
    
    <span class="time">December 22, 2010 at 11:10 pm</span> 
    
    <div class="msgbody"> 
    
    <div class="subject">New digs</div> 
    
    Hello thank you for trying our soap. <BR>  Jim.
    
    </div> 
    </div> 
    
    
    <div class="message reply"> 
    
    <span class="profile">Lars Jörgenmeier</span> 
    
    <span class="time">December 22, 2010 at 11:45 pm</span> 
    
    <div class="msgbody"> 
    
    I never sold you any soap.
    
    </div> 
    
    </div> 
    
    </div>
    </body>
    </html>
    

    So, we'll make an XPath query for this. Let's show the whole thing, then break it down:

    $messages = $xpath->query("//span[@class='profile' and contains(.,'$profile_name')]/../div[@class='msgbody']");
    

    The break down:

    //span

    Give me spans

    //span[@class='profile']

    Give me spans where the class is profile

    //span[@class='profile' and contains(.,'$profile_name')]

    Give me spans where the class is profile and the inside of the span contains $profile_name, which is the name you're after

    //span[@class='profile' and contains(.,'$profile_name')]/../

    Give me spans where the class is profile and the inside of the span contains $profile_name, which is the name you're after now go up a level, which gets us to <div class="message">

    //span[@class='profile' and contains(.,'$profile_name')]/../div[@class='msgbody']

    Give me spans where the class is profile and the inside of the span contains $profile_name, which is the name you're after now go up a level, which gets us to <div class="message"> and finally, give me all divs under <div class="message"> where the class is msgbody

    Now then, here's a sample of the PHP code:

    $doc = new DOMDocument();
    $doc->loadHTMLFile("test.html");
    
    $xpath = new DOMXpath($doc);
    $profile_name = 'Lars Jörgenmeier';
    $messages = $xpath->query("//span[@class='profile' and contains(.,'$profile_name')]/../div[@class='msgbody']");
    foreach ($messages as $message) {
      echo trim("{$message->nodeValue}") . "\n";
    }
    

    XPath is very powerful like this. I recommend looking over a basic tutorial, then you can check the XPath standard if you want to see more advanced usage.