Search code examples
pythonhbaseapache-knox

How to interact to Hbase via knox using Python?


I'm trying to interact with hbase throght knox using Python, In python The admin give a list of knox API endpoints for hive, hbase and spark like: https://knox-devl.www.mysite.com:9042/gateway/MYSITEHDO/hbaseversion/cluster

Now, since I'm using Python's happybase library, my connection code is

import happybase

connection=happybase.Connection('https://knox-devl.www.mysite.com/gateway/MYSITEHDO/hbaseversion/cluster',port=9042)
connection.open()
print(connection.tables())

The error it show is: thriftpy.transport.TTransportException: TTransportException(message="Could not connect to ('https://knox-devl.www.mysite.com/gateway/MYSITEHDO/hbaseversion/cluster', 9042)", type=1)

Also I tried with Phoenixdb lib

import phoenixdb

database_url = 'https://knox-devl.www.mysite.com:9042/gateway/MYSITEHDO/hbaseversion/cluster'
conn = phoenixdb.connect(database_url, autocommit=True)
cursor = conn.cursor()
cursor.execute("SHOW tables")

But I'm getting another error: phoenixdb.errors.InterfaceError: ('RPC request failed', None, None, BadStatusLine("''",)) Exception phoenixdb.errors.InterfaceError: InterfaceError('RPC request failed', None, None, BadStatusLine("''",)) in <bound method Connection.__del__ of <phoenixdb.connection.Connection object at 0x10bc97d90>> ignored

The only way I can get some of the data through curl:

curl -i -k -u guest:guest-password 'https://knox-devl.www.mysite.com:9042/gateway/MYSITEHDO/hbaseversion/cluster'

But there is no SQL commands there.

did anyone know how to do this or there something I'm missing here, like ask for a different URL or enable something on the cluster?


Solution

  • As you identified, the only way to talk to HBase through Knox is via HBase's REST API. Happybase is trying to connect directly to HBase via RPC, which Knox will block.

    You can't use Happybase from outside a cluster with Knox enabled.

    A good tutorial for using the HBase REST API with Python can be found here. In case the link ever dies, some of the most useful commands from this article are:

    • Look at a table's schema:

      request = requests.get(baseurl + "/" + tablename + "/schema")
      
    • Insert a row:

      cellset = Element('CellSet')
      
      linenumber = 0;
      
      for line in shakespeare:      
          rowKey = username + "-" + filename + "-" + str(linenumber).zfill(6)
          rowKeyEncoded = base64.b64encode(rowKey)
      
          row = SubElement(cellset, 'Row', key=rowKeyEncoded)
      
          messageencoded = base64.b64encode(line.strip())
          linenumberencoded = encode(linenumber)
          usernameencoded = base64.b64encode(username)
      
          # Add bleet cell
          cell = SubElement(row, 'Cell', column=messagecolumnencoded)
          cell.text = messageencoded
      
          # Add username cell
          cell = SubElement(row, 'Cell', column=usernamecolumnencoded)
          cell.text = usernameencoded
      
          # Add Line Number cell
          cell = SubElement(row, 'Cell', column=linenumbercolumnencoded)
          cell.text = linenumberencoded
      
          linenumber = linenumber + 1
      
          # Submit XML to REST server
          request = requests.post(baseurl + "/" + tablename + "/fakerow", data=tostring(cellset), headers={"Content-Type" : "text/xml", "Accept" : "text/xml"})
      
    • Delete a table:

      request = requests.delete(baseurl + "/" + tablename + "/schema")