Search code examples
javascriptjavahtmlkotlinjsoup

Parse JavaScript with jsoup


In an HTML page, I want to pick the value of a javascript variable.
Below is the snippet of HTML page:

<input id="hidval" value="" type="hidden"> 
<form method="post" style="padding: 0px;margin: 0px;" name="profile" autocomplete="off">
<input name="pqRjnA" id="pqRjnA" value="" type="hidden">
<script type="text/javascript">
    key="pqRjnA";
</script>

My aim is to read the value of variable key from this page using jsoup.
Is it possible with jsoup? If yes then how?


Solution

  • Since jsoup isn't a javascript library you have two ways to solve this:

    A. Use a javascript library

    • Pro:

      • Full Javascript support
    • Con:

      • Additional libraray / dependencies

    B. Use Jsoup + manual parsing

    • Pro:

      • No extra libraries required
      • Enough for simple tasks
    • Con:

      • Not as flexible as a javascript library

    Here's an example how to get the key with jsoupand some "manual" code:

    Document doc = ...
    Element script = doc.select("script").first(); // Get the script part
    
    
    Pattern p = Pattern.compile("(?is)key=\"(.+?)\""); // Regex for the value of the key
    Matcher m = p.matcher(script.html()); // you have to use html here and NOT text! Text will drop the 'key' part
    
    
    while( m.find() )
    {
        System.out.println(m.group()); // the whole key ('key = value')
        System.out.println(m.group(1)); // value only
    }
    

    Output (using your html part):

    key="pqRjnA"
    pqRjnA