Apache HAWQ installation built on top of HDFS

I would like to install Apache HAWQ based on the Hadoop.

Before installing HAWQ, I should install Hadoop and configure the all my nodes.

I have four nodes as below and my question is as blow.

Should I install a hadoop distribution for hawq-master?

1. hadoop-master   //namenode, Secondary Namenode, ResourceManager, HAWQ Standby, 
2. hawq-master     //HAWQ Master
3. datanode01      //Datanode, HAWQ Segment
4. datanode02      //Datanode, HAWQ Segment

I wrote the role of each node next to the nodes as above. In my opinion, I should install hadoop for hadoop-master, datanode01 and datanode02 and I should set hadoop-master as namenode (master) and the others as datanode (slave). And then, I will install apache HAWQ on all the nodes. I will set hawq-master as a master node and hadoop-master as HAWQ Stand by and finally the other two nodes as HAWQ segment.

What I want is installing HAWQ based on the Hadoop. So, I think the hawq-master should be built on top of hadoop, but there are no connection with hadoop-master.

If I proceed above procedure, then I think that I don't have to install hadoop distribution on hawq-master. Is my thought right to successfully install the HAWQ installation based on the hadoop?

If hadoop should be installed on hawq-master then which one is correct?

1. `hawq-master` should be set as `namenode` .
2. `hawq-master` should be set as 'datanode`.

Any help will be appreciated.

Solution

Honestly, there is no strictly constraints on how the hadoop installed and hawq installed if they are configured correctly.

For your concern, "I think the hawq-master should be built on top of hadoop, but there are no connection with hadoop-master". IMO, it should be "hawq should be built on top of hadoop". And we configured the hawq-master conf files(hawq-site.xml) to make hawq have connections with hadoop.

Usually, for the hawq master and hadoop master, we could install each component on one node, but we could install some of them on one node to save nodes. But for HDFS datanode and HAWQ segment, we often install them together. Taking the workload of each machine, we could install them as below:

                     hadoop             hawq
 hadoop-master       namenode           hawq standby
 hawq-master         secondarynamenode  hawq master
 other node          datanode           segment

If you configure hawq with yarn integration, there would be resourcemanager and nodemanager in the cluster.

                     hadoop role                hawq role
 hadoop-master       namenode                   hawq standby
 hawq-master         snamenode,resourcemanager  hawq master
 other node          datanode, nodemanager      segment

Install them together does not means they have connections, it's your config file that make them can reach each other. You can install all the master component together, but there maybe too heavy for the machine. Read more information about Apache HAWQ at http://incubator.apache.org/projects/hawq.html and read some docs at http://hdb.docs.pivotal.io/211/hdb/index.html.

Besides, you could subscribe the dev and user mail list, send email to [email protected] / [email protected] to subscribe and send emails to [email protected] / [email protected] to ask questions.