Search code examples
parsinggroovyjsoup

Parse specific p values with jsoup


I have the following extract of a much longer page:

<h2 id="Supportedplatforms-Java">Java</h2> 
 <section class="layout-section layout-section-two_equal"> 
  <div class="content-section"> 
   <p><strong>Oracle JRE / JDK:</strong></p>
   <p><img alt="(tick)" data-emoticon-name="tick" class="emoticon emoticon-tick" src="/s/en_GB/7202/e97769bbf919c0bd667762fc102f557beacb7f94/_/images/icons/emoticons/check.png">&nbsp;Java 8</p>
   <p><img alt="(tick)" data-emoticon-name="tick" class="emoticon emoticon-tick" src="/s/en_GB/7202/e97769bbf919c0bd667762fc102f557beacb7f94/_/images/icons/emoticons/check.png">&nbsp;Java 11</p>
   <p><strong>OpenJDK:</strong></p>
   <p><strong><img alt="(tick)" data-emoticon-name="tick" class="emoticon emoticon-tick" src="/s/en_GB/7202/e97769bbf919c0bd667762fc102f557beacb7f94/_/images/icons/emoticons/check.png">&nbsp;</strong>Java 8</p>
   <p><img alt="(tick)" data-emoticon-name="tick" class="emoticon emoticon-tick" src="/s/en_GB/7202/e97769bbf919c0bd667762fc102f557beacb7f94/_/images/icons/emoticons/check.png">&nbsp;Java 11</p> 
  </div> 
<div class="content-section"> = $0

All I want is the following result:
Oracle JRE/JDK:
Java 8
Java 11
OpenJDK:
Java 8
Java 11

I'm using jsoup in groovy:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
def url = "https://url";
def document = Jsoup.connect(url).get()

I tried for the last few hours to no avail, with

Elements test = document.select("#Supportedplatforms-Java > p")

...And hundreds of variation

If you have any pointer, I'd be happy to hear about it!

Thanks


Solution

  • Elements test = document.select(".layout-section .content-section p")