I'm using some crawler code from http://code.google.com/p/crawler4j/.
Now, what I'm trying to do is to access every URLs found in the MyCrawler class from another class.
I start the crawler with :
// * Start the crawl. This is a blocking operation, meaning that your code
// * will reach the line after this only when crawling is finished.
controller.start(MyCrawler.class, numberOfCrawlers);
When I try to use "return" to get my URLs, I get this error :
The return type is incompatible with WebCrawler.visit(Page)
and it asks me to change the type to 'void' but, of course, I don't want to.
Here's the function that I have trouble with :
@Override
public String visit(Page page) {
url = page.getWebURL().getURL();
System.out.println("URL: " + url);
if (page.getParseData() instanceof HtmlParseData) {
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
String text = htmlParseData.getText();
String html = htmlParseData.getHtml();
List<WebURL> links = htmlParseData.getOutgoingUrls();
System.out.println("Text length: " + text.length());
System.out.println("Html length: " + html.length());
System.out.println("Number of outgoing links: " + links.size());
return url;
}
I also tried to use a getter but since it is a "blocking operation", it doesn't work. I'm running out of ideas.
You can't override a method if you change the method signature. If you change the signature you are making a new method. If all you want is the list of urls you visited, instead of returning the urls, try storing them in an ArrayList and make a getter which returns the list.