Scala: Creating a HBase table with pre splitting region based on Row Key

I have three RegionServers. I want to evenly distribute a HBase table onto three regionservres based on rowkeys which I have already identified (say, rowkey_100 and rowkey_200). It can be done from hbase shell using:

create 'tableName', 'columnFamily', {SPLITS => ['rowkey_100','rowkey_200']}

If I am not mistaken, this 2 split points will create 3 regions, and the first 100 rows will go to the 1st regionserver, next 100 rows will be in 2nd regionserver and the remaining rows in last regionserver. I want to do the same thing using scala code. How can I specify this in scala code to split table into regions?

Solution

Below is a Scala snippet for creating a HBase table with splits:

val admin = new HBaseAdmin(conf)

if (!admin.tableExists(myTable)) {
  val htd = new HTableDescriptor(myTable)
  val hcd = new HColumnDescriptor(myCF)
  val splits = Array[Array[Byte]](splitPoint1.getBytes, splitPoint2.getBytes)

  htd.addFamily(hcd)
  admin.createTable(htd, splits)
}

There are some predefined region split policies, but in case you want to create your own way of setting split points that span your rowkey range, you can create a simple function like the following:

def autoSplits(n: Int, range: Int = 256) = {
  val splitPoints = new Array[Array[Byte]](n)
  for (i <- 0 to n-1) {
    splitPoints(i) = Array[Byte](((range / (n + 1)) * (i + 1)).asInstanceOf[Byte])
  }
  splitPoints
}

Just comment out the val splits = ... line and replace createTable's splits parameter with autoSplits(2) or autoSplits(4, 128), etc.