Search code examples
javagroovyjsoupkatalon-studio

Jsoup library "Did not find balanced marker" error


Using jsoup library I am trying to get the href of an <a> element which contains specified text each time.

Example:

import org.jsoup.Jsoup
import org.jsoup.nodes.Document
import org.jsoup.select.Elements

public class GlobVars {
    public static Document currentPageSource
    public static String currentTitle
}

def get_url() {
    String url = "https://www.website.com/"
    GlobVars.currentPageSource = Jsoup.connect(url).get()

    Elements wElements = GlobVars.currentPageSource.select('a[class="class-name"]:contains('+GlobVars.currentTitle+')')
    if(wElements) {
        /*
         * Do stuff...
         * 
         * */
    }
}

The problem is when GlobVars.currentTitle contains single quote character!!! For example, if GlobVars.currentTitle is I am here it "works" fine. But if GlobVars.currentTitle is I'm here i get this error: Did not find balanced marker at 'I'.

I tried to use GlobVars.currentTitle variable with double-quoted, triple-single-quoted or triple-double-quoted but I get the same error.

I also read https://github.com/jhy/jsoup/issues/1105 but the "trick" to escape quotes can not be used in my case.

Any idea how I'll fix this?


Solution

  • // @Grab(group='org.jsoup', module='jsoup', version='1.14.3')
    
    import org.jsoup.Jsoup
    import org.jsoup.nodes.Document
    import org.jsoup.select.Elements
    
    def html = """
    <html>
    <body>
    <a class="c1" href="#1">i'm the one</a>
    <a class="c1" href="#2">i am the one</a>
    </body>
    </html>
    """
    
    def desiredText = "i'm the one"
    // escape special chars. maybe you need more special chars to escape...
    desiredText = desiredText.replaceAll(/(['"\\\/\|(\)\[\]])/, '\\\\$1') 
    
    Document currentPageSource = Jsoup.parse(html)
    Elements wElements = currentPageSource.select('a[class="c1"]:contains('+ desiredText +')')
    

    or

    def desiredText = "i'm the one"
    Elements wElements = currentPageSource.select('a[class="c1"]').findAll{it.html().contains(desiredText)}