Search code examples
javajsouphtml-parsing

Search for particular string in <td> of html code and if present print next <td> vaule using JSOUP


 I have html code like below

<html>
<body>

<div id="1">
    <table>
        <tr>
            <td>ID</td>
            <td>:</td>
            <td>123</td>
        </tr>   

        <tr>
            <td>Status</td>
            <td>:</td>
            <td>Fail</td>
        </tr>
    </table>
</div>
<div id="2">
    <table>
        <tr>
            <td>ID</td>
            <td>:</td>
            <td>456</td>
        </tr>   

        <tr>
            <td>Status</td>
            <td>:</td>
            <td>Success</td>
        </tr>
    </table>
</div>
<div id="3">
    <table>
        <tr>
            <td>ID</td>
            <td>:</td>
            <td>789</td>
        </tr>   

        <tr>
            <td>Status</td>
            <td>:</td>
            <td>Fail</td>
        </tr>
    </table>
</div>
<div id="4">
    <table>
        <tr>
            <td>ID</td>
            <td>:</td>
            <td>135</td>
        </tr>   

        <tr>
            <td>Status</td>
            <td>:</td>
            <td>Success</td>
        </tr>
    </table>
</div>

</body>
</html>

I need to parse this HTML code. I need to iterate through all div tags present and Search for "Search" in the td's in every div iteratively. If present get its 2nd adjacend td value i.e., Fail / Success. if If is "Fail" then I need to again search for "ID" and if present I need to print its 2nd adjacent div value i.e., 123 and 789 in this case.

Pseudo code might look like below

if(code contains "Status")
{
    1. Get its 2nd td value i.e., Fail/Success

   if(td value is "Fail")
  {
    1. Search for "ID"
    if("ID" present)
    {
        Print the number/2nd adjacent <td> value    
    }
  }
}

I had tried this in javascript something like below

var t0=$(this).find('tr:has(td:contains("Test Status"))');
        if (t0.length) 
        {
            var str0 =t0.text().trim();
            str0 = /:(.+)/.exec(str0)[1];

            if(str0 == "FAIL")
            {

                var t1=$(this).find('tr:has(td:contains("Test ID"))');
                if (t1.length) 
                {
                    str =t1.text().trim();
                    str = /:(.+)/.exec(str)[1];
                    testIDArray.push(str);
                    // alert(str);
                } 
           }

But I need to do it in java using jsoup. I tried somethinng like below

String htmlString = fileContent;
            Document document = Jsoup.parse(htmlString);
            Elements elements = document.body().select("div"); for (Element element : elements) { String link = element.select("td:contains(Test Status)").attr("<tr>");

                 if(link != null || !(link.isEmpty())) 
                 {
                        System.out.println(link);
                        System.out.println("=========================");
                 }
            }

Kindly help me with this. I don't know how to proceed.

Thanks in advance.

Kindly help me with this.


Solution

  • You can use Java Streams to solve this:

    List<String> failedIds = document.body().select("div table").stream()
            .map(e -> e.select("tr"))
            .filter(trs -> "FAIL".equalsIgnoreCase(trs.last().select("td").last().text()))
            .map(trs -> trs.first().select("td").last().text())
            .collect(Collectors.toList());
    

    The result will be:

    [123, 789]
    

    First you select div table to get all the elements. Then you select all trs and filter those which have Status Fail (trs -> trs.first().select("td").last().text()). At the end you map the ID (trs -> trs.first().select("td").last().text()).

    To print the ids instead of creating a List you can use .forEach():

    document.body().select("div table").stream()
            .map(e -> e.select("tr"))
            .filter(trs -> "FAIL".equalsIgnoreCase(trs.last().select("td").last().text()))
            .map(trs -> trs.first().select("td").last().text())
            .forEach(System.out::println);
    

    Alternatively you can use this (without Streams):

    for (Element e : document.body().select("div table")) {
        Elements trs = e.select("tr");
        if ("FAIL".equalsIgnoreCase(trs.last().select("td").last().text())) {
            String id = trs.first().select("td").last().text();
            System.out.println(id);
        }
    }