apache-spark hive hadoop-yarn hortonworks-data-platform ambari

Do we need install all HDP's Services Client in all node?

We want to deploy HDP 3.1.5 in production environment We have 3 server for masternode and 6 server for workernode And we have plan component layout across 9 nodes above but we want to make sure where we need to place the service-client below

yarn clients

First we've plan to install this to 9 nodes, does it okay or just install to 3 master nodes? Because as far as we know, yarn is needed for all nodes include resource managers and node managers

Or is it just needed for launch yarn apps or anything else

mapreduce2 clients

Same as above, we plan to install it to 9 nodes because it required for mapreduce jobs Do we need to install across 9 nodes?

hive clients

We've plan to install it to 3 master nodes, or we just need to install it to a master node? Is it just only needed for submit hive apps from beeline (cli)?

infra solr clients

we just plan to install it to 9 nodes and we dont know enough to know how this client works

kerberos clients

Does all nodes need kerberos clients because it automatically installed across all nodes when we deploy in development environment

oozie clients

same as infra solr clients point, 9 nodes (plan)

Pig Clients

We've plan to install it to only 3 master node, is it related to run pig via cli or submit pig applications?

spark2 clients

we've plan to install it to a master node because we just want limit it where only one server that can submit spark apps

but in development environment, it installed in all nodes, how do uninstall the spark2 client in worker nodes?

sqoop clients

same point as number 9, only to a master node

Tez client

we plan to install it to 9 nodes but we dont have any info how this client works

Solution

Clients of any service is nothing but some libraries/binaries which will allow you to connect/access the service from the nodes where it is installed.

You can certainly restrict on which nodes you want to install the clients.

Some clients have to be installed on all nodes e.g kerberos client.

Clients will not use much of disk space, however the more clients you have the more time it will take for that service to start.

Whenever you start or restart service, by default ambari will check if clients are installed or not. (No way to by-pass this.)

Now all that being said, lets take a look at your scenario:-

yarn clients: Not necessary to have it on installed on master hosts but good to have it on all nodes
mapreduce2 clients: Not necessary to have it on installed on master hosts but good to have it on all nodes
hive clients: Yes, it is only needed to run beeline and run hive queries through command line. You can choose on which hosts you want to install it.
infra solr clients: Installing it on 2 or 3 would be sufficient, as the clients are needed to access the service. Unless you use infra-solr extensively.
kerberos clients: needs to be on all nodes, if not you will have kerberos issues
oozie clients: Installing it on 2 or 3 would be sufficient
Pig Clients: it is related to both
spark2 clients

curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://AMBARI_SERVER_HOST:8080/api/v1/clusters/CLUSTERNAME/hosts/HOSTNAME/host_components/Client_name
sqoop clients: Installing it on 2 or 3 would be sufficient
Tez client: Installing it on 2 or 3 would be sufficient

Please keep in mind, that you can install it in any way you want.

I would suggest to choose 3-4 nodes and install all the clients on those hosts which are required.