I have an application that is basically an Hbase Mapreduce job with Apache Gora. I am very simple case that I want to copy one Hbase table data to a new table. Where to write new table name. I have reviewed this Guide but could not find where to put new table name. Following is the code snippet,
/* Mappers are initialized with GoraMapper.initMapper() or
* GoraInputFormat.setInput()*/
GoraMapper.initMapperJob(job, inStore, TextLong.class, LongWritable.class,
LogAnalyticsMapper.class, true);
/* Reducers are initialized with GoraReducer#initReducer().
* If the output is not to be persisted via Gora, any reducer
* can be used instead. */
GoraReducer.initReducerJob(job, outStore, LogAnalyticsReducer.class);
Simple MR job is very easy for this case.
I will redirect you to the tutorial, but I will try to clarify here :)
The table name is defined in you mappings. Check Table Mappings. Maybe you have a file called gora-hbase-mapping.xml
where the mapping is defined.
There should be something like this:
<table name="Nameofatable">
<class name="blah.blah.EntityA" keyClass="java.lang.Long" table="Nameofatable">
There you configure the table name (put the same name if you find both). There can be several <table>
and <class>
. Maybe one for your input and one for your output.
AFTER that, you have to instantiate your input/output datastores inStore
and outStore
. The tutorial got a bit messy and the creation of inStore
and outStore
got to the wrong section. You just do something like:
inStore = DataStoreFactory.getDataStore(String.class, EntityA.class, hadoopConf);
outStore = DataStoreFactory.getDataStore(Long.class, OtherEntity.class, hadoopConf);
Explanation "in the other way":
DataStoreFactory.getDatastore(key class, entity class, conf).
for <class name="blah.blah.EntityA"
it is the attribute table=
. That is your table name :)So: you define an entity as input with its table name, and you define an entity as ouput with its table name
If the entity class is the same, but the table names are different, the only solution I can think of is creating two classes Entity1
and Entity2
with the same schema and in your gora-hbase-mapping.xml
create two <table>
and <class>
Then instantiante the stores like:
inStore = DataStoreFactory.getDataStore(String.class, Entity1.class, hadoopConf);
outStore = DataStoreFactory.getDataStore(String.class, Entity2.class, hadoopConf);
It is not very clean but it should work :\
EDIT 2 (not for this question):
If the source table and the destination table are the same, there is a version for initReducerJob that allows this behavior.An example is in Nutch's GeneratorJob.java
StorageUtils.initMapperJob(currentJob, fields, SelectorEntry.class, WebPage.class, GeneratorMapper.class, SelectorEntryPartitioner.class, true);
StorageUtils.initReducerJob(currentJob, GeneratorReducer.class);