Search code examples
javahadoophbasejruby

How does HBase internally analysis "hbase shell command"?


Suppose, I run get 't1','r1' command in hbase shell, How does HBase internally analysis and execute this command?


Solution

  • This is a jruby script. which was defined under set of shell commands.

    I am quoting here java HashMap as an example for better understanding..

    • while inserting , Your rowkey is just like key in java HashMap which will be stored in one of the region server(in hash map case these are buckets which are uniformly distributed..)
    • While getting back the row, it uses rowkey and it will locate particular region server and brings the value for that, from the table you mentioned.example of hashmap like...

    That's the reason while dealing with hbase rowkey design should be perfect (with salting technique , using hashing algorithm for ex: mumur hash) and it should be uniformly distributed across region servers to prevent hot spotting... enter image description here

    For more details, have a look at get.rb

    module Shell
      module Commands
        class Get < Command
          def help
            return <<-EOF
    Get row or cell contents; pass table name, row, and optionally
    a dictionary of column(s), timestamp, timerange and versions. Examples:
      hbase> get 'ns1:t1', 'r1'
      hbase> get 't1', 'r1'
      hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}
      hbase> get 't1', 'r1', {COLUMN => 'c1'}
      hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
      hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
      hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
      hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
      hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
      hbase> get 't1', 'r1', 'c1'
      hbase> get 't1', 'r1', 'c1', 'c2'
      hbase> get 't1', 'r1', ['c1', 'c2']
      hbase> get 't1', 'r1', {COLUMN => 'c1', ATTRIBUTES => {'mykey'=>'myvalue'}}
      hbase> get 't1', 'r1', {COLUMN => 'c1', AUTHORIZATIONS => ['PRIVATE','SECRET']}
      hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE'}
      hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1}
    Besides the default 'toStringBinary' format, 'get' also supports custom formatting by
    column.  A user can define a FORMATTER by adding it to the column name in the get
    specification.  The FORMATTER can be stipulated: 
     1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)
     2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'.
    Example formatting cf:qualifier1 and cf:qualifier2 both as Integers: 
      hbase> get 't1', 'r1' {COLUMN => ['cf:qualifier1:toInt',
        'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] } 
    Note that you can specify a FORMATTER by column only (cf:qualifier).  You cannot specify
    a FORMATTER for all columns of a column family.
    
    The same commands also can be run on a reference to a table (obtained via get_table or
    create_table). Suppose you had a reference t to table 't1', the corresponding commands
    would be:
      hbase> t.get 'r1'
      hbase> t.get 'r1', {TIMERANGE => [ts1, ts2]}
      hbase> t.get 'r1', {COLUMN => 'c1'}
      hbase> t.get 'r1', {COLUMN => ['c1', 'c2', 'c3']}
      hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
      hbase> t.get 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
      hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
      hbase> t.get 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
      hbase> t.get 'r1', 'c1'
      hbase> t.get 'r1', 'c1', 'c2'
      hbase> t.get 'r1', ['c1', 'c2']
      hbase> t.get 'r1', {CONSISTENCY => 'TIMELINE'}
      hbase> t.get 'r1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1}
    EOF
          end
    
          def command(table, row, *args)
            get(table(table), row, *args)
          end
    
          def get(table, row, *args)
            @start_time = Time.now
            formatter.header(["COLUMN", "CELL"])
    
            count, is_stale = table._get_internal(row, *args) do |column, value|
              formatter.row([ column, value ])
            end
    
            formatter.footer(count, is_stale)
          end
        end
      end
    end
    
    #add get command to table
    ::Hbase::Table.add_shell_command('get')
    

    if you want to get one record similarly like hbase shell command, you can follow below snippet.

    Update based on your comment : if you want to have same functionality in java

     /**
         * Get a row
         */
        @Override
        public void getOneRecord(final String tableName, final String rowKey) throws IOException {
            final HTable table = new HTable(HBaseConn.getHBaseConfig(), getTable(tableName));
            final Get get = new Get(rowKey.getBytes());
            final Result rs = table.get(get);
            for (final KeyValue kv : rs.raw()) {
                LOG.info(kv.getRow() + " " + kv.getFamily() + ":" + kv.getQualifier() + " " + +kv.getTimestamp());
                LOG.info(new String(kv.getValue()));
            }
        }
    

    Note : There java approach and shell approach are 2 different things. pls. don't mix both, as I have seen your other questions as well, I think you are bit confused about them. If you want to write jruby just like I explained you can also do as well. but that was not common approach.

    Hope that helps.