Search code examples
javahadoopmapreducehbasegora

Apache gora, where to set new table name in reducer


I have an application that is basically an Hbase Mapreduce job with Apache Gora. I am very simple case that I want to copy one Hbase table data to a new table. Where to write new table name. I have reviewed this Guide but could not find where to put new table name. Following is the code snippet,

/* Mappers are initialized with GoraMapper.initMapper() or 
   * GoraInputFormat.setInput()*/
  GoraMapper.initMapperJob(job, inStore, TextLong.class, LongWritable.class,
      LogAnalyticsMapper.class, true);

  /* Reducers are initialized with GoraReducer#initReducer().
   * If the output is not to be persisted via Gora, any reducer 
   * can be used instead. */
  GoraReducer.initReducerJob(job, outStore, LogAnalyticsReducer.class);

Simple MR job is very easy for this case.


Solution

  • I will redirect you to the tutorial, but I will try to clarify here :)

    The table name is defined in you mappings. Check Table Mappings. Maybe you have a file called gora-hbase-mapping.xml where the mapping is defined. There should be something like this:

    <table name="Nameofatable">
    ...
    <class name="blah.blah.EntityA" keyClass="java.lang.Long" table="Nameofatable">
    

    There you configure the table name (put the same name if you find both). There can be several <table> and <class>. Maybe one for your input and one for your output.

    AFTER that, you have to instantiate your input/output datastores inStore and outStore. The tutorial got a bit messy and the creation of inStore and outStore got to the wrong section. You just do something like:

    inStore = DataStoreFactory.getDataStore(String.class, EntityA.class, hadoopConf);
    outStore = DataStoreFactory.getDataStore(Long.class, OtherEntity.class, hadoopConf);
    

    Explanation "in the other way":

    • You instantiate the datastore with DataStoreFactory.getDatastore(key class, entity class, conf).
    • The entity class requested is looked into gora-hbase-mapping.xml for <class name="blah.blah.EntityA".
    • In that <class> it is the attribute table=. That is your table name :)

    So: you define an entity as input with its table name, and you define an entity as ouput with its table name


    EDIT 1:

    If the entity class is the same, but the table names are different, the only solution I can think of is creating two classes Entity1 and Entity2 with the same schema and in your gora-hbase-mapping.xml create two <table> and <class>. Then instantiante the stores like:

    inStore = DataStoreFactory.getDataStore(String.class, Entity1.class, hadoopConf);
    outStore = DataStoreFactory.getDataStore(String.class, Entity2.class, hadoopConf);
    

    It is not very clean but it should work :\


    EDIT 2 (not for this question):

    If the source table and the destination table are the same, there is a version for initReducerJob that allows this behavior.An example is in Nutch's GeneratorJob.java:

    StorageUtils.initMapperJob(currentJob, fields, SelectorEntry.class, WebPage.class, GeneratorMapper.class, SelectorEntryPartitioner.class, true);
    StorageUtils.initReducerJob(currentJob, GeneratorReducer.class);