Search code examples
phpweb-scrapingdomxpathdomdocument

Xpath reverse searching


Is there a way, when using DOM_Document Xpath to search in reverse (from the end of the page moving up instead of from the top down?) If so, how would I do this?

I am doind a scrape of a web site. (linked below). http://www.sturmfh.com/obit-display.jhtml?DB=update/obits/dbase&DO=display&ID=1189477693_24578

I only want to scrape the 3 obituary paragraphs. So i figured it'd be easiest to start at the end and move up.


Solution

  • Use:

    (//p)[position() > count(//p) - 3]
    

    This selects the last (up to three) p elements in the XML document.

    XSLT - based verification:

    <xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>
    
     <xsl:template match="node()|@*">
         <xsl:copy-of select="(//p)[position() > count(//p) - 3]"/>
     </xsl:template>
    </xsl:stylesheet>
    

    When applied against the document, referenced in the question, this transformation evaluates the XPath expression and outputs the selected p elements.

    The result is:

    <p>
                    If you would like to share your thoughts and memories,<br/> we will deliver your message to the family.<br/>
       <a href="mailto:[email protected]?Subject=For%20the%20Family%20of%20Lyle%20Meier">Click</a>
       <a href="mailto:[email protected]?Subject=For%20the%20Family%20of%20Lyle%20Meier">
          <img src="/images/email_condol.gif" alt="Logo" border="0" align="middle"/>
       </a>
       <a href="mailto:[email protected]?Subject=For%20the%20Family%20of%20Lyle%20Meier">here</a>.
            </p>
    <p>To Request a Tribute Folder
                    <br/>
       <a href="./obit-foldreq.jhtml?fname=Lyle&amp;lname=Meier">Click</a>
       <a href="./obit-foldreq.jhtml?fname=Lyle&amp;lname=Meier">
          <img src="/images/email_condol.gif" border="0" alt="View" align="top"/>
       </a>
       <a href="./obit-foldreq.jhtml?fname=Lyle&amp;lname=Meier">here</a>
    </p>