Search code examples
javascriptjsonxmlhadoophbase

Hbase: put multiple versions of a row at the same time using JSON


From Cloudera Hbase REST API docs this is XML structure to PUT multiple rows at the same time.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
  <CellSet>
    <Row key="cm93NQo=">
      <Cell column="Y2Y6ZQo=">dmFsdWU1Cg==</Cell>
      <Cell column="Y2Y6ZQo=">dmFsdWU1Cg==</Cell>
    </Row>
    <Row key="cm93NQo=">
      <Cell column="Y2Y6ZQo=">dmFsdWU1Cg==</Cell>
    </Row>
  </CellSet>

Q: How do I do it using JSON?

What I've tried so far:

  1. With CellSet key, having following error:

Error 500 Unrecognized field "CellSet" (Class org.apache.hadoop.hbase.rest.model.CellSetModel), not marked as ignorable

    {
      "CellSet": {
        "Row": [
          {
            "key": "cm93NQo=",
            "Cell": [
              {
                "column": "Y2Y6ZQo=",
                "$": "dmFsdWU1Cg=="
              },
              {
                "column": "Y2Y6ZQo=",
                "$": "dmFsdWU1Cg=="
              }
            ]
          },
          {
            "key": "cm93NQo=",
            "Cell": [
              {
                "column": "Y2Y6ZQo=",
                "$": "dmFsdWU1Cg=="
              }
            ]
          }
        ]
      }
    }

  1. Without CellSet key, without errors and with only one version per row:

{
   "Row": [
    {
      "key": "cm93NQo=",
      "Cell": [
        {
          "column": "Y2Y6ZQo=",
          "$": "dmFsdWU1Cg=="
        },
        {
          "column": "Y2Y6ZQo=",
          "$": "dmFsdWU1Cg=="
        }
      ]
    },
    {
      "key": "cm93NQo=",
      "Cell": [
        {
          "column": "Y2Y6ZQo=",
          "$": "dmFsdWU1Cg=="
        }
      ]
    }
  ]
}


Solution

  • Sure you can't insert multiple versions of a row if they will have the same timestamp. In your example data is identified only with row key and a column. I did not work with Cloudera and never used HBase REST api, but according to source code on github, CellModel allows to set cell timestamp. So I suggest to add it to your request:

    "Row": [
        {
          "key": "myRowKey",
          "Cell": [
            {
              "column": "myColumn",
              "$": "value1",
              "timestamp" : 1473379200
            },
            {
              "column": "myColumn",
              "$": "value2",
              "timestamp" : 1470000000
            }
          ]
        }
    

    Also, in your example there are two rows with same key, check that data is correct