Search code examples
javascalahadoophbasescalding

Alternatives to scalding for HBase access from Scala (or Java)


Could anybody please recommend good solution (framework) to access HBase on Hadoop cluster from Scala (or Java) application?

By now I'm moving in scalding direction. Prototypes I obtained allowed me to combine scalding library with Maven and separate scalding job JAR from 'library' code packages. This in turn allowed me to run scalding based Hadoop jobs from outside cluster with minimal overhead per job ('library' code is posted to cluster 'distributed cache' only when it changes (which is rarely needed) so I can load jobs code fast).

Now I'm actually starting to play with HBase itself and I see scalding is good but it is not so 'native' to HBase. Yes, there are things like hbase-scalding but as I anyway have some point to plan future actions I'd like to know about other good solutions I probably missed.

What is expected:

  • Applications (jobs) starting overhead should be low. I need to run lot of them.
  • It should be possible (easier - better) to run jobs from outside cluster without any SSH (just based on 'hadoop jar' command or even simply by application execution).
  • Jobs language itself should allow short, logical semantic. Ideally this code should be simple enough to be automatically generated.
  • This solution should be productive on big enough HBase tables (initially up to 100.000.000 entries).
  • OK, solution should be 'live' (being actively developing) but relatively good in terms of general stability.

I think argumentation here could be even more useful than solution itself and this question should add couple of ideas for many people. Any peace of advice?


Solution

  • HPaste http://www.gravity.com/labs/hpaste/ may be what you are looking for.