Search code examples
javahtmlparsingweb-crawlercrawler4j

Determining parameters on crawler4j


I am trying to use crawler4j like it was shown to be used in this example and no matter how I define the number of crawlers or change the root folder I continue to get this error from the code stating:

"Needed parameters: rootFolder (it will contain intermediate crawl data) numberOfCralwers (number of concurrent threads)" The main code is below:

public class Controller {

    public static void main(String[] args) throws Exception {

            if (args.length != 2) {
                    System.out.println("Needed parameters: ");
                    System.out.println("\t rootFolder (it will contain intermediate crawl data)");
                    System.out.println("\t numberOfCralwers (number of concurrent threads)");
                    return;
            }

            /*
             * crawlStorageFolder is a folder where intermediate crawl data is
             * stored.
             */
            String crawlStorageFolder = args[0];


            /*
             * numberOfCrawlers shows the number of concurrent threads that should
             * be initiated for crawling.
             */
            int numberOfCrawlers = Integer.parseInt(args[1]);

There was a similar question asking exactly what I want to know here , but I didn't quite understand the solution, like where I was to type java BasicCrawler Controller "arg1" "arg2" . I am running this code on Eclipse and I am still fairly new to the world of programming. I would really appreciate it if someone helped me understand this problem


Solution

  • If you aren't giving any arguments when you are running the file, you will get that error. Put the following as comment sin your code or delete it.

    if (args.length != 2) {
                    System.out.println("Needed parameters: ");
                    System.out.println("\t rootFolder (it will contain intermediate crawl data)");
                    System.out.println("\t numberOfCralwers (number of concurrent threads)");
                    return;
            }
    

    And after that set your root folder to the one where you want to store the meta data.