Search code examples
serializationjdbcapache-phoenix

How do I write an array of type Binary (serialized objects) using Phoenix JDBC?


I am attempting to use the Phoenix Query Server to create a thin phoenix JDBC client which reads and writes data to an underlying Phoenix/Hbase database.

I am struggling with figuring out how to write a column which is an array of type BINARY. These BINARY elements are serialized forms of a custom Bean.

Relevant code from schema defining the column: mySerializedArrayCol BINARY(800)[],

I am using java.sql.PrepareStatement to build queries. Here is my best failed attempt so far:

        String SQL = "UPSERT INTO myTable VALUES(?,?,?,?,?,?,?)";

        Connection connection = getNewConnection();
        PreparedStatement pstmt = connection.prepareStatement(SQL);

        //...
        
        pstmt.setArray(5, connection.createArrayOf("BINARY",
                Arrays.stream(myObjectArray)
                        .map(obj -> SerializationUtils.serialize(obj))
                        .toArray(byte[][]::new)
        ));


        pstmt.executeUpdate();
        connection.commit();

which throws a NullPointerException at Runtime when running executeUpdate():

java.lang.NullPointerException
    at org.apache.phoenix.schema.types.PArrayDataType.toBytes(PArrayDataType.java:142)
    at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:193)
    at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:174)
    at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:161)
    at org.apache.phoenix.compile.UpsertCompiler$UpdateColumnCompiler.visit(UpsertCompiler.java:888)
    at org.apache.phoenix.compile.UpsertCompiler$UpdateColumnCompiler.visit(UpsertCompiler.java:872)
    at org.apache.phoenix.parse.BindParseNode.accept(BindParseNode.java:47)
    at org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:761)
    at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:787)
    at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:773)
    at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:403)
    at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:393)
    at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:392)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:380)
    at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:173)
    at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:183)
    at org.apache.calcite.avatica.jdbc.JdbcMeta.execute(JdbcMeta.java:868)
    at org.apache.calcite.avatica.remote.LocalService.apply(LocalService.java:254)
    at org.apache.calcite.avatica.remote.Service$ExecuteRequest.accept(Service.java:1031)
    at org.apache.calcite.avatica.remote.Service$ExecuteRequest.accept(Service.java:1001)
    at org.apache.calcite.avatica.remote.AbstractHandler.apply(AbstractHandler.java:94)
    at org.apache.calcite.avatica.remote.ProtobufHandler.apply(ProtobufHandler.java:46)

Using phoenix 4.14.3 with Hbase 1.4 and Java8.

So, I am actually able to use the phoenix-spark connector in a separate Scala program to correctly write dataframes into the same table. I am also able to read from the same table using the phoenix JDBC and properly deserialize the array of objects. I'm really not sure why this in particular is failing, and cannot find any documentation on how to deal with arrays of BINARY / serialized objects.


Solution

  • I debugged this to the point of noticing the error was coming from the Phoenix server side. The client side appears to be constructing the query objects as expected.

    I ended up changing my data schema by wrapping my serialized array in another object so that I can just store the serialized wrapper object instead of an array of serialized objects..

    So, the row in the phoenix schema went from:

    mySerializedArrayCol BINARY(800)[]
    

    to:

    mySerializedArrayCol VARBINARY
    

    I suppose this is the more standard approach from a data-engineering perspective considering I found 0 posts of some one attempting my initial schema above. But, it required extra work in other components of the project that needed to be trasnformed to handle the new wrapped array schema.

    Phoenix does claim to support Arrays of type Binary, it would have been nice to get some closure :shrug: