I have an application that is basically an Hbase Mapreduce job with Apache Gora. I am very simple case that I want to copy one Hbase table data to a new table. Where to write new table name. I have reviewed this Guide but could not find where to put new table name. Following is the code snippet,
/* Mappers are initialized with GoraMapper.initMapper() or
* GoraInputFormat.setInput()*/
GoraMapper.initMapperJob(job, inStore, TextLong.class, LongWritable.class,
LogAnalyticsMapper.class, true);
/* Reducers are initialized with GoraReducer#initReducer().
* If the output is not to be persisted via Gora, any reducer
* can be used instead. */
GoraReducer.initReducerJob(job, outStore, LogAnalyticsReducer.class);
Simple MR job is very easy for this case.
I will redirect you to the tutorial, but I will try to clarify here :)
The table name is defined in you mappings. Check Table Mappings. Maybe you have a file called gora-hbase-mapping.xml
where the mapping is defined.
There should be something like this:
<table name="Nameofatable">
...
<class name="blah.blah.EntityA" keyClass="java.lang.Long" table="Nameofatable">
There you configure the table name (put the same name if you find both). There can be several <table>
and <class>
. Maybe one for your input and one for your output.
AFTER that, you have to instantiate your input/output datastores inStore
and outStore
. The tutorial got a bit messy and the creation of inStore
and outStore
got to the wrong section. You just do something like:
inStore = DataStoreFactory.getDataStore(String.class, EntityA.class, hadoopConf);
outStore = DataStoreFactory.getDataStore(Long.class, OtherEntity.class, hadoopConf);
Explanation "in the other way":
DataStoreFactory.getDatastore(key class, entity class, conf).
gora-hbase-mapping.xml
for <class name="blah.blah.EntityA"
.<class>
it is the attribute table=
. That is your table name :)So: you define an entity as input with its table name, and you define an entity as ouput with its table name
EDIT 1:
If the entity class is the same, but the table names are different, the only solution I can think of is creating two classes Entity1
and Entity2
with the same schema and in your gora-hbase-mapping.xml
create two <table>
and <class>
.
Then instantiante the stores like:
inStore = DataStoreFactory.getDataStore(String.class, Entity1.class, hadoopConf);
outStore = DataStoreFactory.getDataStore(String.class, Entity2.class, hadoopConf);
It is not very clean but it should work :\
EDIT 2 (not for this question):
If the source table and the destination table are the same, there is a version for initReducerJob that allows this behavior.An example is in Nutch's GeneratorJob.java
:
StorageUtils.initMapperJob(currentJob, fields, SelectorEntry.class, WebPage.class, GeneratorMapper.class, SelectorEntryPartitioner.class, true);
StorageUtils.initReducerJob(currentJob, GeneratorReducer.class);