I use a Hadoop & YARN cluster with one node. All hadoop and yarn daemons are started in this node. I also start a fetch step with Apache Nutch 1.15 distributed crawl, with inject and generate steps successfully finished.
I am trying to run Firefox browser inside a map task which runs on a YarnChild container, using Selenium 3.149.54 FirefoxDriver. The Firefox process starts, but a window pops up saying that the Firefox profile is missing or is inaccessible, and the map task is blocked until I close the window.
Selenium 3.141.54 FirefoxDriver uses geckodriver to get Firefox started, and from the geckodriver output, which is in the stderr log from container userlog, I see that it attempted to run Firefox using the command:
1557726792743 mozrunner::runner INFO Running command: "/usr/bin/firefox" "-marionette" "-profile" "/tmp/rust_mozprofile.0dQXae46ZwUd" "-foreground" "-no-remote"
What I am noticing, and correct me if I am wrong, is that inside a map task I have access to binaries from the host local file system, like /usr/bin/firefox, but somehow, when firefox is started inside a Map Task, by the FirefoxDriver, using geckodriver, which sits in the host local file system, with the command above, the Firefox process can not see "/tmp/rust_mozprofile.0dQXae46ZwUd" directory, which sits inside host local fs /tmp.
I've tried to set FirefoxDriver to use a profile which sits in hdfs, that has the same information from a temporary profile from /tmp, but the window which says that the profile is missing or is inaccessible still appears.
I've tried to read a file from host local fs /tmp, using hadoop LocalFileSystem API, from within the map task, and I could read it, so I have access to local fs from the map task.
Knowing all of this, I can not understand why geckodriver can not start Firefox using the profile from /tmp.
The following code is the main code that runs somewhere in nested functions calls from getProtocolOutput, from FetcherThread, which is started in FetcherRun mapper. Simply put, the following code runs in a specific thread started in a Mapper:
profile = new FirefoxProfile();
boolean enableFlashPlayer = conf.getBoolean("selenium.firefox.enable.flash", false);
int loadImage = conf.getInt("selenium.firefox.load.image", 1);
int loadStylesheet = conf.getInt("selenium.firefox.load.stylesheet", 1);
System.setProperty("webdriver.gecko.driver", conf.get("webdriver.gecko.driver"));
profile.setPreference("dom.ipc.plugins.enabled.libflashplayer.so", enableFlashPlayer);
profile.setPreference("permissions.default.stylesheet", loadImage);
profile.setPreference("permissions.default.image", loadStylesheet);
profile.setPreference("marionette", false);
profile.setAcceptUntrustedCertificates(true);
long firefoxBinaryTimeout = conf.getLong("selenium.firefox.binary.timeout", 45);
binary = new FirefoxBinary();
binary.setTimeout(TimeUnit.SECONDS.toMillis(firefoxBinaryTimeout));
binary.addCommandLineOptions("-profile", "/home/iulian/firefox.profile");
options = new FirefoxOptions();
options.setBinary(binary).setProfile(profile);
driver = new FirefoxDriver(options); // the execution stop here and the window appears, which says that firefox profile is missing or is inaccessible
System.out.println("Finished starting driver.");
long pageLoadWait = conf.getLong("libselenium.page.load.delay", 10);
driver.manage().timeouts().pageLoadTimeout(pageLoadWait, TimeUnit.SECONDS);
Where do you think is the problem? How can I debug in a simple way the access to /tmp?
Thanks in advance!
I've managed to find the problem.
When starting a Firefox browser, this browser needs to have an environment with a valid HOME env variable path directory, meaning that the home directory should correspond to the current user who starts the browser, for the creation of firefox profile related files.
The problem with a Map Task from Hadoop, in my case, it was that the HOME env variable it was just "HOME=/home/". I do not have write permissions for this directory with the user who executed the map task, and implicitly the firefox browser. Consequently, the pop-up which appeared every time it was for the reason that the firefox browser could not create the profile related files in the HOME directory.