Search code examples
hadoophbase

Should the HBase region server and Hadoop data node on the same machine?


Sorry that I don't have the resource to set up a cluster to test it, I'm just wondering to know:

  1. Can I deploy hbase region server on a separated machine other than the hadoop data node machine? I guess the answer is yes, but I'm not sure.

  2. Is it good or bad to deploy hbase region server and hadoop data node on different machines?

  3. When putting some data into hbase, where is this data eventually stored in, data node or region server? I guess it's data node, but what is the StoreFile and HFile in region server, isn't it the physical file to store our data?

Thank you!


Solution

    1. RegionServers should always run alongside DataNodes in distributed clusters if you want decent performance.

    2. Very bad, that will work against the data locality principle (If you want to know a little more about data locality check this: http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html)

    3. Actual data will be stored in the HDFS (DataNode), RegionServers are responsible of serving and managing regions.

    For more information about HBase architecture please check this excelent post from Lars' blog: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

    BTW, as long as you have a PC with decent RAM you can set up a demo cluster with virtual machines. Do not ever try to set up a production environment without properly test the platform first in a development environment.