Search code examples
javahadoophbase

Hbase CopyTable inside Java


I want to copy one Hbase table to another location with good performance.

I would like to reuse the code from CopyTable.java from Hbase-server github page

I've been looking the doccumentation from hbase but it didn't help me much http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/CopyTable.html

After looking in this post of stackoverflow: Can a main() method of class be invoked in another class in java

I think I can directly call it using its main class.

Question: Do you think anyway better to get this copy done rather than using CopyTable from hbase-server ? Do you see any inconvenience using this CopyTable ?


Solution

  • Question: Do you think anyway better to get this copy done rather than using CopyTable from hbase-server ? Do you see any inconvenience using this CopyTable ?

    First thing is snapshot is better way than CopyTable.

    • HBase Snapshots allow you to take a snapshot of a table without too much impact on Region Servers. Snapshot, Clone and restore operations don't involve data copying. Also, Exporting the snapshot to another cluster doesn't have impact on the Region Servers.

    Prior to version 0.94.6, the only way to backup or to clone a table is to use CopyTable/ExportTable, or to copy all the hfiles in HDFS after disabling the table. The disadvantages of these methods are that you can degrade region server performance (Copy/Export Table) or you need to disable the table, that means no reads or writes; and this is usually unacceptable.

    Also, see Snapshots+and+Repeatable+reads+for+HBase+Tables

    Snapshot Internals


    Another Map reduce way than CopyTable :

    You can implement something like below in your code this is for standalone program where as you have write mapreduce job to insert multiple put records as a batch (may be 100000).

    This increased performance for standalone inserts in to hbase client you can try this in mapreduce way

    public void addMultipleRecordsAtaShot(final ArrayList<Put> puts, final String tableName) throws Exception {
            try {
                final HTable table = new HTable(HBaseConnection.getHBaseConfiguration(), getTable(tableName));
                table.put(puts);
                LOG.info("INSERT record[s] " + puts.size() + " to table " + tableName + " OK.");
            } catch (final Throwable e) {
                e.printStackTrace();
            } finally {
                LOG.info("Processed ---> " + puts.size());
                if (puts != null) {
                    puts.clear();
                }
            }
        }
    

    along with that you can also consider below...

    Enable write buffer to large value than default

    1) table.setAutoFlush(false)

    2) Set buffer size

    <property>
             <name>hbase.client.write.buffer</name>
             <value>20971520</value> // you can double this for better performance 2 x 20971520 = 41943040
     </property>
                 OR
    
        void setWriteBufferSize(long writeBufferSize) throws IOException
    

    The buffer is only ever flushed on two occasions:
    Explicit flush
    Use the flushCommits() call to send the data to the servers for permanent storage.

    Implicit flush
    This is triggered when you call put() or setWriteBufferSize(). Both calls compare the currently used buffer size with the configured limit and optionally invoke the flushCommits() method.

    In case the entire buffer is disabled, setting setAutoFlush(true) will force the client to call the flush method for every invocation of put().