I'm trying to interact with hbase throght knox using Python, In python
The admin give a list of knox API endpoints for hive, hbase and spark like:
https://knox-devl.www.mysite.com:9042/gateway/MYSITEHDO/hbaseversion/cluster
Now, since I'm using Python's happybase library, my connection code is
import happybase
connection=happybase.Connection('https://knox-devl.www.mysite.com/gateway/MYSITEHDO/hbaseversion/cluster',port=9042)
connection.open()
print(connection.tables())
The error it show is:
thriftpy.transport.TTransportException: TTransportException(message="Could not connect to ('https://knox-devl.www.mysite.com/gateway/MYSITEHDO/hbaseversion/cluster', 9042)", type=1)
Also I tried with Phoenixdb lib
import phoenixdb
database_url = 'https://knox-devl.www.mysite.com:9042/gateway/MYSITEHDO/hbaseversion/cluster'
conn = phoenixdb.connect(database_url, autocommit=True)
cursor = conn.cursor()
cursor.execute("SHOW tables")
But I'm getting another error:
phoenixdb.errors.InterfaceError: ('RPC request failed', None, None, BadStatusLine("''",))
Exception phoenixdb.errors.InterfaceError: InterfaceError('RPC request failed', None, None, BadStatusLine("''",)) in <bound method Connection.__del__ of <phoenixdb.connection.Connection object at 0x10bc97d90>> ignored
The only way I can get some of the data through curl:
curl -i -k -u guest:guest-password 'https://knox-devl.www.mysite.com:9042/gateway/MYSITEHDO/hbaseversion/cluster'
But there is no SQL commands there.
did anyone know how to do this or there something I'm missing here, like ask for a different URL or enable something on the cluster?
As you identified, the only way to talk to HBase through Knox is via HBase's REST API. Happybase is trying to connect directly to HBase via RPC, which Knox will block.
You can't use Happybase from outside a cluster with Knox enabled.
A good tutorial for using the HBase REST API with Python can be found here. In case the link ever dies, some of the most useful commands from this article are:
Look at a table's schema:
request = requests.get(baseurl + "/" + tablename + "/schema")
Insert a row:
cellset = Element('CellSet')
linenumber = 0;
for line in shakespeare:
rowKey = username + "-" + filename + "-" + str(linenumber).zfill(6)
rowKeyEncoded = base64.b64encode(rowKey)
row = SubElement(cellset, 'Row', key=rowKeyEncoded)
messageencoded = base64.b64encode(line.strip())
linenumberencoded = encode(linenumber)
usernameencoded = base64.b64encode(username)
# Add bleet cell
cell = SubElement(row, 'Cell', column=messagecolumnencoded)
cell.text = messageencoded
# Add username cell
cell = SubElement(row, 'Cell', column=usernamecolumnencoded)
cell.text = usernameencoded
# Add Line Number cell
cell = SubElement(row, 'Cell', column=linenumbercolumnencoded)
cell.text = linenumberencoded
linenumber = linenumber + 1
# Submit XML to REST server
request = requests.post(baseurl + "/" + tablename + "/fakerow", data=tostring(cellset), headers={"Content-Type" : "text/xml", "Accept" : "text/xml"})
Delete a table:
request = requests.delete(baseurl + "/" + tablename + "/schema")