Search code examples
javajsoupframeset

jsoup parsing issue with frameset and id attributes


I have source code like this:

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
        <title>SCHOOL-100</title> 
    </head>  
    <frameset rows="111, 1*" border=0>    
        <frame name=top src="top.cgi" marginwidth=0 marginheight=0 noresize scrolling="no"> 
        <frameset cols="200, 1*">     
            <frame name=left namo_target_frame=right src="left.cgi" scrolling=yes>     
            <frame name=right namo_target_frame=_self src="LTE_info.cgi">   
        </frameset>         
        <noframes>         
            <body bgcolor=white text=black link=blue vlink=purple alink=red>  
                <input type=hidden id=age value="12" >  
                <input type=hidden id=class value="9" > 
                <p> </p>    
            </body>      
        </noframes>     
    </frameset>         
</html> 

I'm fetching the data from the url. I have tried:

Document doc = Jsoup.connect("mobile.testmifi/cgi-bin/frame_main.cgi").get();
Elements media = doc.select("noframes");
for (Element src : media) {
    //System.out.println("media source is ---- " + src.text());
}

The issue I'm facing is that I'm not able to reach after noframe node. I want to fetch the value of id=age and id=class . All those values are coming as string not as nodes. If I do getElementsbyAttribute("id") it shows null.

I need to fetch the value of age/ class (ids) using jsoup, please help anyone, Thanks in advance.


Solution

  • The problem here is noframes is not recognised as a standard tag and the html inside will be treated as its value. If you just want to get the values of age and class, you could take the value of noframes tag and parse it as a body fragment and then read. For e.g.

    import java.io.File;
    import java.io.IOException;
    
    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    
    
    public class JsoupParser4 {
    
        public static void main(String[] args) {
            try {
                Document doc = Jsoup.parse(new File("mob.html"), "UTF-8");
                Document noFramesDoc = Jsoup.parseBodyFragment(doc.select("noframes").text());
                System.out.println("Age = " + noFramesDoc.select("input[id=age]").attr("value"));
                System.out.println("Class = " + noFramesDoc.select("input[id=class]").attr("value"));
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    
    }
    

    The mob.html file contains the html tag code in your question.