Search code examples
groovyxml-parsingjsoupgroovy-console

How to parse xml using groovy


I'm new to groovy xml parsing. I'm trying to parse the below xml file

<font face=Tahoma size=2>
   Team,<br/><br/>  Please find below the test summary details for the 'Test' execution.<br/><br/><b><U>Transaction Summary Table:</U></b><br/><br/>
   <table border=1 CELLPADDING =3 style='font-family:Tahoma;font-size:12'>
      <tr>
         <b>
            <th bgcolor=#C0C0C0> TransactionName </th>
            <th bgcolor=#C0C0C0> AverageLatency </th>
            <th bgcolor=#C0C0C0> MinimumLatency </th>
            <th bgcolor=#C0C0C0> MaximumLatency </th>
            <th bgcolor=#C0C0C0> AverageElapsedTime </th>
            <th bgcolor=#C0C0C0> MinimumElapsedTime </th>
            <th bgcolor=#C0C0C0> MaximumElapsedTime </th>
            <th bgcolor=#C0C0C0> TotalCount </th>
            <th bgcolor=#C0C0C0> PassPercentage </th>
         </b>
      </tr>
      <tr>
         <td>1 /aumentum/</td>
         <td>
            <center>1648.0</center>
         </td>
         <td>
            <center>1240</center>
         </td>
         <td>
            <center>2900</center>
         </td>
         <td>
            <center>1907.0</center>
         </td>
         <td>
            <center>1495</center>
         </td>
         <td>
            <center>3140</center>
         </td>
         <td>
            <center>45</center>
         </td>
         <td>
            <center>100.0</center>
         </td>
      </tr>
      <tr>
         <td>T01_Aumentum_Home</td>
         <td>
            <center>6.0</center>
         </td>
         <td>
            <center>1</center>
         </td>
         <td>
            <center>10</center>
         </td>
         <td>
            <center>1956.0</center>
         </td>
         <td>
            <center>1490</center>
         </td>
         <td>
            <center>3806</center>
         </td>
         <td>
            <center>213</center>
         </td>
         <td>
            <center>0.0</center>
         </td>
      </tr>
 </tbody>
   </table>
   <br/><br/>Thanks,<br/>Performance Team.
</font>
<br/><br/>

Expected Result:

 [{
"transaction name":"1 /aumentum/", 
"AverageLatency ":"1648.0",
"Minimum latency":"1240",
"MaximumLatency ":"2900",
"AverageElapsedTime":"1907.0",
"MinimumElapsedTime":"1495",
"MaximumElapsedTime":"3140",
"TotalCount":"45",
"PassPercentage":"100.0"
},
{
"transaction name": "1 /aumentum/",
"AverageLatency ":"1648.0",
"Minimum latency":"1240",
"MaximumLatency ":"2900",
"AverageElapsedTime":"1907.0",
"MinimumElapsedTime":"1495",
"MaximumElapsedTime":"3140",
"TotalCount":"45",
"PassPercentage":"100.0"

}]

i have got the first children using values using docParser.getElementsByTag("tr").first()

Here is the error I get:

Exception thrown
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
    at org.jsoup.select.Elements.get(Elements.java:519)
    at org.jsoup.nodes.Element.child(Element.java:174)
    at org.jsoup.nodes.Element$child$0.call(Unknown Source)
    at CommonUtils.parseLRHTMLReport(jmeteragent.groovy:304)
    at CommonUtils$parseLRHTMLReport.call(Unknown Source)

Here is what I have done so far:

def transactiondetails12 = null
def iterator12 = 0
int count1 = 0
def violcounts = 0
def violations = null;

tmpElement = docParser.getElementsByTag("tr").first()
println tmpElement.children()
// tmpElement= tmpElement.child(0)
// println "#########tmpElement#########:" +tmpElement


for (element in tmpElement.children()) {
    if (iterator12 == 0) {
        // transactiondetails1 = "<table border=1 CELLPADDING =3 style='font-family:Tahoma;font-size:12'><tr><b><th bgcolor=#C0C0C0>"  +
        element.child(0).text().trim() + "</th><th bgcolor=#C0C0C0>" + element.child(2).text().trim() + "</th><th bgcolor=#C0C0C0>" +
                element.child(3).text().trim() + "</th><th bgcolor=#C0C0C0>" + element.child(4).text().trim() + "</th></b></tr>"
        iterator12 = 1;
        count1++;
        //  println "nqwlieufrh    2938ry    `9p23dhWCDNJ    p3fu89    Q2390RUD"+transactiondetails1
    } else {
        count1++;
        if (count1 <= 5) {

            //   println "iterator1iterator1iterator1iterator1"+iterator1++
            transactiondetails12 = transactiondetails12 + "<tr><td>" + element.child(0).text().trim() + "</td><td><center>" +
                    element.child(2).text().trim() + "</center></td><td><center>" +
                    element.child(3).text().trim() + "</center></td><td><center>" +
                    element.child(4).text().trim()
            println "transactiondetails12" + transactiondetails12
            //   println "3215463654156436212315465123011482145634217225445622341"+element.child(4).text().trim()
            String violation1 = element.child(1).text()
            // violation=Integer.valueOf(violation1)
            // violation=Integer.parseInt(violation1)

            //   if(violation1>=0)
            if (violation1.length() > 0) {
                violcounts++
            }


        }
    }

}

I have no idea how to map the tmpElement.children() values. Any advise on this would be helpful. Thanks in advance.


Solution

  • The sample you have provided uses jsoup library that is useful for HTML DOM manipulation. The solution to your problem is to use correct selectors to extract the data.

    Consider following example:

    def headers = docParser.select("tr > th").collect { it.text() }
    def result = []
    
    docParser.select("tr:has(td)").each { tr ->
        def obj = [:]
        tr.select("td").eachWithIndex { Element td, int i ->
            obj[headers[i]] = td.text()
        }
        result << obj
    }
    
    println JsonOutput.prettyPrint(JsonOutput.toJson(result))
    
    • docParser.select("tr > th").collect { it.text() } collects table headers and stores them as an ordered List<String>
    • docParser.select("tr:has(td)") selects all rows (excluding table header) with data
    • tr.select("td").eachWithIndex iterates inside each row, collects the data and associates it with header by index i
    • the last line displays desired output to console

    Output:

    [
        {
            "TransactionName": "1 /aumentum/",
            "AverageLatency": "1648.0",
            "MinimumLatency": "1240",
            "MaximumLatency": "2900",
            "AverageElapsedTime": "1907.0",
            "MinimumElapsedTime": "1495",
            "MaximumElapsedTime": "3140",
            "TotalCount": "45",
            "PassPercentage": "100.0"
        },
        {
            "TransactionName": "T01_Aumentum_Home",
            "AverageLatency": "6.0",
            "MinimumLatency": "1",
            "MaximumLatency": "10",
            "AverageElapsedTime": "1956.0",
            "MinimumElapsedTime": "1490",
            "MaximumElapsedTime": "3806",
            "TotalCount": "213",
            "PassPercentage": "0.0"
        }
    ]
    

    And here you can find full Groovy script I've used for experimenting with your example: https://gist.github.com/wololock/651a536dff4e104ebba0eef69d4ac3ea

    I hope it helps.