I'm trying to extract forum posts (message2) while getting rid of the blockquote (message1). Here is the HTML (post content modified/simplified):
<div class="cPost_contentWrap ipsPad">
<div data-controller="core.front.core.lightboxedImages" class="ipsType_normal ipsType_richText ipsContained" itemprop="text" data-role="commentContent">
<blockquote data-ipsquote-contentclass="forums_Topic" data-ipsquote-contentid="40244" data-ipsquote-contenttype="forums" data-ipsquote-contentapp="forums" data-cite="aries_gurl" data-ipsquote-username="aries_gurl" data-ipsquote-contentcommentid="584324" class="ipsQuote" data-ipsquote="">
<div>
(message1)
</div>
</blockquote>
<p>(message2)</p>
</div>
I am trying with the following XPath query:
//div[@class="ipsType_normal ipsType_richText ipsContained"]/p[not(@class="ipsQuote")]
For some reason, however, this query returns all subsequent posts under the same case rather than just the current node -so, taking the above as a reference, the returned results would be: message2 message2 message2 message2, and so on (total N of messages).
Is there a way I can get one message at a time? Thank you!
Is there a way I can get one message at a time?
Yes ;) use:
(//div[@class="ipsType_normal ipsType_richText ipsContained"]/p[not(@class="ipsQuote")])[1]
for the first one. And [n] with n=1..x for the others.