JSoup: get wikipedia page summary

I used MediaWiki API to get a wikipedia page, after getting html content I tried using

p:not(h2 ~ p)

to get page summary paragraphs, it should be paragraphs before table of contents element, it gets the wanted part but has additional paragraphs, where is the problem ?

Solution

p:not(h2 ~ p) gets every single paragraph on the page that doesn't have h2 before it in the same parent. This includes nested paragraphs, paragraphs outside the main content altogether, etc, because none of those paragraphs share the same parent element as h2 itself. You don't want those; you only want the paragraphs that appear just before h2 elements within their parent element.

For that, you want to anchor the outer p selector to the parent element. The parent element you want is .mw-parser-output:

.mw-parser-output > p:not(h2 ~ p)

How do I add a default text to the beginning of an html text area?
ES2015 import doesn't work (even at top-level) in Firefox
Can I force a page break in HTML printing?
Why HTML <button type="submit" >doesn't work with internet explorer?
CSS Is Not Importing into HTML
Set the default value of an input field
How to force background image to stretch/compress to fit div, without keeping aspect ratio
How to render raw html in the PyHTML library
HTML CSS Layout breaks with parent of floated elements
How does the "position: sticky;" property work?
Content Security Policy error - violating directive: script-src 'self'
Navigation arrows are not showing
Fixing Border Radius in Safari?
How to show the date picker if I click the parent div after explictly hiding the date type input in React?
Array of data showing in console but not in HTML after passing it through dialog in Angular project
Puppeteer.page.waitForSelector timeouts, even though the element is present
How can I allow document.domain on a sandboxed iframe?
problems with my button background color and hover and active states
Ignore html tags in preg_replace
Append element as sibling after element?
React.js: Set innerHTML vs dangerouslySetInnerHTML
How do I auto scroll in Angular?
Button click not showing/hiding content
Will browser pull from cache if the same resource is being requested by a different origin?
filter file upload for only text files
Correct XPath query to fetch div inner text
How to select(highlight) and copy ordered list numbers
Why javascript is not loading for document.readyState==="complete"
Set the background color of the entire webpage (the browser's window)
HTML/CSS Flip Card Works on Desktop but not Mobile