I used MediaWiki API to get a wikipedia page, after getting html content I tried using
p:not(h2 ~ p)
to get page summary paragraphs, it should be paragraphs before table of contents element, it gets the wanted part but has additional paragraphs, where is the problem ?
p:not(h2 ~ p)
gets every single paragraph on the page that doesn't have h2
before it in the same parent. This includes nested paragraphs, paragraphs outside the main content altogether, etc, because none of those paragraphs share the same parent element as h2
itself. You don't want those; you only want the paragraphs that appear just before h2
elements within their parent element.
For that, you want to anchor the outer p
selector to the parent element. The parent element you want is .mw-parser-output
:
.mw-parser-output > p:not(h2 ~ p)