Search code examples
xpathblockquote

Excluding blockquote from forum post with xpath


I'm trying to extract forum posts (message2) while getting rid of the blockquote (message1). Here is the HTML (post content modified/simplified):

 <div class="cPost_contentWrap ipsPad">
                      <div data-controller="core.front.core.lightboxedImages" class="ipsType_normal ipsType_richText ipsContained" itemprop="text" data-role="commentContent">
                        <blockquote data-ipsquote-contentclass="forums_Topic" data-ipsquote-contentid="40244" data-ipsquote-contenttype="forums" data-ipsquote-contentapp="forums" data-cite="aries_gurl" data-ipsquote-username="aries_gurl" data-ipsquote-contentcommentid="584324" class="ipsQuote" data-ipsquote="">
                          <div>
                            (message1)
                          </div>
                        </blockquote>

                        <p>(message2)</p>
                      </div>

I am trying with the following XPath query:

//div[@class="ipsType_normal ipsType_richText ipsContained"]/p[not(@class="ipsQuote")]

For some reason, however, this query returns all subsequent posts under the same case rather than just the current node -so, taking the above as a reference, the returned results would be: message2 message2 message2 message2, and so on (total N of messages).

Is there a way I can get one message at a time? Thank you!


Solution

  • Is there a way I can get one message at a time?

    Yes ;) use:

    (//div[@class="ipsType_normal ipsType_richText ipsContained"]/p[not(@class="ipsQuote")])[1] 
    

    for the first one. And [n] with n=1..x for the others.