Search code examples
javahbaseapache-stormtrident

Storm-HBase Trident - Query Multiple columns simultaneously


I am building a Trident topology that queries HBaseState. I am using the org.apache.storm.hbase package.

My understanding (correct me if I'm wrong) is that HBaseQuery reads all column values (or those specified in the projectionCriteria) for a given rowKey and outputs each column individually with Fields("columnName","columnValue").

For example, if I had a table of pets, with rowKey being the pet name and a column for "type" and a column for "age", stateQuery would receive input tuple with a Values("Fido") and would output two separate tuples with:

Values("Fido","Type","Dog")

Values("Fido","Age",11)

Some questions:

  1. Is there a way to get values from multiple columns in one query? Meaning, can I get a single output with Fields("Name","column1Value","column2Value")?

  2. If there is a way to get values from multiple columns into one tuple, is it still possible to do if they are of different types (e.g. one is a String one is an Integer)?

Ultimately, my goal is to be able to take input tuples with Fields("Name") and get single output tuples with Fields("Name","Type","Age"), for example Values("Fido","Dog",11) and Values("Mr. Kibbles","Cat",4). If it's not possible using the above, how is it possible?

TIA for any help!


Solution

  • I solved the problem myself, posting here for posterity:

    The reason I was having difficulty is because I was building off of the WordCountValueMapper without actually understanding how it was being used. A little deeper digging into the Results class helped.

    Here's how I'm implementing it now:

    public static class MyValueMapper implements HBaseValueMapper {
      @Override
      public List<Values> toValues(ITuple tuple, Result result) throws Exception {
        List<Values> values = new ArrayList<Values>();
        Cell[] cells = result.rawCells();
    
        values.add(new Values(Bytes.toString(CellUtil.cloneValue(cells[0])), Bytes.toInt(CellUtil.cloneValue(cells[1]))));
        return values;
      }
      @Override
      public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("type","age"));
      }
    }
    

    In the WordCountValueMapper, they iterated through each cell in the result, which is equivalent to iterating through each column. Instead, I took the whole array of cells and pulled out the values. Nothing very clever, I just didn't understand it before.