I am working on a project where I would like to read the data from the HBase system. I read that there are various HBase clients available, the default Java Client, Thrift, Avro, etc.
Now I am confused if I choose the default Java Client then would I be able to read the data which is stored in the HBase using the thrift client?
I wanted to understand that if I use the thrift client to read the data from HBase then it will use the thrift de-serializer to convert data from binary type to appropriate type? If that is true then data loaded using the thrift client will be corrupted if I read using the HBase default client?
Thanks for your help!! ~Rohit
If you are developing your HBase application in Java, I recommend using the raw HBase API - that is more powerful than Thrift, REST, Avro, etc.
The Java HBase API communicates directly with the HBase database through the Zookeepers.
If you are not using Java, then you have to go with one of the other protocols - Thrift, REST, Avro, etc. For example, Python has some libraries for Thrift (I recommend HappyBase) as well as REST. So do Ruby and other languages.
If you insert data using the Java API (directly), you will be able to retrieve the exact same data using Thrift on Python or Ruby. You might want to be careful with the data structure/format (HBase stores everything as bytes) so just be careful storing strings, ints, unicode strings, etc.