When trying parse html page of website it crashes with the error:
java.io.IOException:Mark has been invalidated.
Part of my code:
String xml = xxxxxx;
try {
Document document = Jsoup.connect(xml).maxBodySize(1024*1024*10)
.timeout(0).ignoreContentType(true)
.parser(Parser.xmlParser()).get();
Elements elements = document.body().select("td.hotv_text:eq(0)");
for (Element element : elements) {
Element element1 = element.select("a[href].hotv_text").first();
hashMap.put(element.text(), element1.attr("abs:href"));
}
} catch (HttpStatusException ex) {
Log.i("GyWueInetSvc", "Exception while JSoup connect:" + xml +" cause:"+ ex.getMessage());
} catch (IOException e) {
e.printStackTrace();
throw new RuntimeException("Socket timeout: " + e.getMessage(), e);
}
The size of website which I want parse is about 2MB. And when I debug code I see that when in java package ConstrainableInputStream.java
method:
public void reset() throws IOException {
super.reset();remaining = maxSize - markpos;
}
and returns markpos= -1
then goes to the exception.
How can I solve that problem?
I found solution of the problem. Problem was in buffer overloading. Solved using below code:
BufferedReader br = null;
try{
connection = new URL(xml).openConnection();
Scanner scanner = new Scanner(connection.getInputStream());
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
content = content +line;
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
Document document = Jsoup.parse(content);