I have a quick question that I am having a hard time figuring out. I want to read an html file line by line but I want to skip over the HEAD tag. Therefore, I figured that I could start reading the text after skipping past the HEAD tag.
So far I have created:
BufferedReader reader = new BufferedReader(new InputStreamReader(socket.getInputStream()));
StringBuilder string = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
if (line.startsWith("<html>"))
string.append(line + "\n");
}
I want to save the html code in memory without the HEAD information.
Example:
<HTML>
<HEAD>
<TITLE>Your Title Here</TITLE>
</HEAD>
<BODY BGCOLOR="FFFFFF">
<CENTER><IMG SRC="clouds.jpg" ALIGN="BOTTOM"> </CENTER>
<a href="http://somegreatsite.com">Link Name</a>is a link to another nifty site
<H1>This is a Header</H1>
<H2>This is a Medium Header</H2>
Send me mail at <a href="mailto:[email protected]">[email protected]</a>.
</BODY>
I want to save everything but the tag information.
How about something like this -
boolean htmlFound = false; // Have we found an open html tag?
StringBuilder string = new StringBuilder(); // Back to your code...
String line;
while ((line = reader.readLine()) != null) {
if (!htmlFound) { // Have we found it yet?
if (line.toLowerCase().startsWith("<html")) { // Check if this line opens a html tag...
htmlFound = true; // yes? Excellent!
} else {
continue; // Skip over this line...
}
}
System.out.println("This is each line: " + line);
string.append(line + "\n");
}