I have some questions regarding HIVE configuration on a production level. If have a HDFS setup remotely:
Where would I have to install Hive so that I can run HQL queries based on the data in HDFS? What all configurations need to be made in Hive?
Where would the metastore db be located?
Hive Server shall be installed on a Master Node like HDFS NameNode and Secondary NameNode (see this sample schema http://pivotalhd.docs.pivotal.io/docs/01-RawContent/Getting-Started/PHD2_Typical_Cluster_Topology.png). But you also need to install YARN.
Sqoop is usually installed on a client (edge) node.
If you use a distribution like Hortonworks or Cloudera, they include a manager with wizards to ease deployment of all services like Hive, YARN, HBase, etc.