I would like to install Apache HAWQ based on the Hadoop.
Before installing HAWQ, I should install Hadoop and configure the all my nodes.
I have four nodes as below and my question is as blow.
Should I install a hadoop distribution for hawq-master
?
1. hadoop-master //namenode, Secondary Namenode, ResourceManager, HAWQ Standby,
2. hawq-master //HAWQ Master
3. datanode01 //Datanode, HAWQ Segment
4. datanode02 //Datanode, HAWQ Segment
I wrote the role of each node next to the nodes as above.
In my opinion, I should install hadoop for hadoop-master
, datanode01
and datanode02
and I should set hadoop-master
as namenode (master) and the others as datanode
(slave). And then, I will install apache HAWQ on all the nodes. I will set hawq-master
as a master node and hadoop-master
as HAWQ Stand by and finally the other two nodes as HAWQ segment.
What I want is installing HAWQ based on the Hadoop. So, I think the hawq-master
should be built on top of hadoop, but there are no connection with hadoop-master
.
If I proceed above procedure, then I think that I don't have to install hadoop distribution on hawq-master
. Is my thought right to successfully install the HAWQ installation based on the hadoop?
If hadoop should be installed on hawq-master
then which one is correct?
1. `hawq-master` should be set as `namenode` .
2. `hawq-master` should be set as 'datanode`.
Any help will be appreciated.
Honestly, there is no strictly constraints on how the hadoop installed and hawq installed if they are configured correctly.
For your concern, "I think the hawq-master should be built on top of hadoop, but there are no connection with hadoop-master". IMO, it should be "hawq should be built on top of hadoop". And we configured the hawq-master conf files(hawq-site.xml) to make hawq have connections with hadoop.
Usually, for the hawq master and hadoop master, we could install each component on one node, but we could install some of them on one node to save nodes. But for HDFS datanode and HAWQ segment, we often install them together. Taking the workload of each machine, we could install them as below:
hadoop hawq
hadoop-master namenode hawq standby
hawq-master secondarynamenode hawq master
other node datanode segment
If you configure hawq with yarn integration, there would be resourcemanager and nodemanager in the cluster.
hadoop role hawq role
hadoop-master namenode hawq standby
hawq-master snamenode,resourcemanager hawq master
other node datanode, nodemanager segment
Install them together does not means they have connections, it's your config file that make them can reach each other. You can install all the master component together, but there maybe too heavy for the machine. Read more information about Apache HAWQ at http://incubator.apache.org/projects/hawq.html and read some docs at http://hdb.docs.pivotal.io/211/hdb/index.html.
Besides, you could subscribe the dev and user mail list, send email to [email protected] / [email protected] to subscribe and send emails to [email protected] / [email protected] to ask questions.