Search code examples
javaapache-sparkhbase

Spark access Row object value


I want to iterate a dataframe by partitions and for each partition iterate all of its rows and create a deleteList of them that will contain HBase's delete objects for each row. I'm using Spark and HBase with Java and I've created a Row object with the following code:

df.foreachPartition((ForeachPartitionFunction<Row> iterator -> {
  while (iterator.hasNext()) {
    Row row = RowFactory.create(iterator.next());
    deleteList.add(new Delete(Bytes.toBytes(String.valueOf(row))));
  }
}

But it won't work because I cannot access row's value correctly. While df has one column named "hbase_key".


Solution

  • It's hard to tell from your post which class exactly is Row, but I suspect it is org.apache.spark.sql.Row ?

    If that's the case, try the methods like getString(i) or similar, where i is the index of the column in the row you are trying to access.

    Again, depending on how you are configuring your Hbase access, I suspect that in your case the 0 index would be the value of the row-key of the physical HBase table, and the subsequent indices will be the respective column values that are returned with your row. But again, that would depend on how exactly you arrived at this point in your code.

    Your Row object should have methods to access other data types as well, such as getInt(i), etc.