Search code examples
vagrantvirtual-machineclouderaubuntu-16.04vagrant-plugin

Virtual Cluster with Vagrant behind corporate proxy


I’m trying to run a Virtual Apache Hadoop cluster on my laptop using Vagrant and Cloudera Manager following these instructions:

http://blog.cloudera.com/blog/2014/06/how-to-install-a-virtual-apache-hadoop-cluster-with-vagrant-and-cloudera-manager/

I’m using a Dell Precision M4800 Workstation Laptop with 16GB of RAM which runs an Ubuntu 16.04 LTS (Xenial Xerus) OS.

I successfully managed to install VirtualBox and Vagrant but I can’t connect to the nodes of my cluster, what I did was:

  1. configure the proxy settings for CLI tools:

    $export http_proxy="http://user:password@proxy_server:port"
    $export https_proxy="https://user:password@proxy_server:port""
    
  2. go into the project directory

  3. update the hosts file on each active machine:

    $vagrant hostmanager
    
  4. create and configure guest machines according to Vagrantfile

    $vagrant up
    
  5. Try to surf to http://vm-cluster-node1:7180 but got an error “server not found”

Since I am behind a corporate proxy I installed the vagrant proxyconf plugin, as suggested here: How to use vagrant in a proxy environment?

and than I changed my Vagrantfile adding the following lines:

if Vagrant.has_plugin?("vagrant-proxyconf")
  config.proxy.http     = "http://user:password@proxy_server:port" 
  config.proxy.https    = "https://user:password@proxy_server:port"
  config.proxy.no_proxy = "localhost,127.0.0.1"
end

the problem now is that after vagrant up command I get the following error:

==> master: Failed to fetch http://archive.cloudera.com/cm5/ubuntu/precise/amd64/cm/pool/contrib/e/enterprise/cloudera-manager-daemons_5.8.2-1.cm582.p0.17~precise-cm5_all.deb  Connection failed
==> master: Failed to fetch http://archive.cloudera.com/cm5/ubuntu/precise/amd64/cm/pool/contrib/o/oracle-j2sdk1.7/oracle-j2sdk1.7_1.7.0+update67-1_amd64.deb  Connection failed
==> master: E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
==> master: cloudera-scm-server-db: unrecognized service
==> master: cloudera-scm-server-db: unrecognized service
==> master: cloudera-scm-server: unrecognized service
The SSH command responded with a non-zero exit status. Vagrant assumes 
that this means the command failed. The output for this command should be 
in the log above. Please read the output to determine what went wrong.

What am I doing wrong?


Solution

  • It turned out that it wasn't a proxy configuration problem (that configuration was correct) but it's a corporate firewall problem, the firewall allows only certain packages to be downloaded.

    I have "solved" the problem by installing Cloudera Manager using my cellphone as a hotspot.

    Once Cloudera Manager and Hadoop stack are installed on your cluster you can use Cloudera Manager Web GUI and manage your cluster in the corportate enviroment.

    The only problem is that some important cluster features such as clock synchronization don't work properly in the corporate enviroment, in particular I found that my company firewall blocks NTP (the problem is better described here: https://askubuntu.com/questions/429306/ntpdate-no-server-suitable-for-synchronization-found)