Is it possible to execute a query asynchronously in hive server?
For eg, How can I /Is it possible to do something like this from the client-
QueryHandle handle = executeAsyncQuery(hiveQuery);
Status status = handle.checkStatus();
if(status.isCompleted()) {
QueryResult result = handle.fetchResult();
}
I also had a look at How do I make an async call to Hive in Java?. But did not help. The answers were mostly around the thrift clients taking a callback argument.
Any help would be appreciated. Thanks!
[EDIT 1]
I went through the HiveConnection.java in hive-jdbc. hive-jdbc by default uses the async thrift APIs. Hence it submits a query and polls for result sets (look at HiveStatement.java). Now i am able to write a piece of code which is purely non blocking. But the problem is as soon as the client disconnect the foot print about the query is lost.
Client 1
final TCLIService.Client client = new TCLIService.Client(createBinaryTransport(host, port, loginTimeout, sessConf, false)); // from HiveConnection.java
TSessionHandle sessionHandle = openSession(client) // from HiveConnection.java
TExecuteStatementReq execReq = new TExecuteStatementReq(sessionHandle, sql);
execReq.setRunAsync(true);
execReq.setConfOverlay(sessConf);
final TGetOperationStatusReq handle = client.ExecuteStatement(execReq)
writeHandleToFile("~/handle", handle)
Client 2
final TGetOperationStatusReq handle = readHandleFromFile("~/handle")
final TCLIService.Client client = new TCLIService.Client(createBinaryTransport(host, port, loginTimeout, sessConf, false));
while (true) {
System.out.println(client.GetOperationStatus(handle).getOperationState());
Thread.sleep(1000);
}
Client 2 keeps printing FINISHED_STATE as long as Client 1 is alive. But if client 1 process completes or gets killed, client 2 starts printing null which means hiveserver2 is cleaning up the resources as soon as a client disconnects.
Is it possible to configure hiveserver2 to configure this clean up process based on time or something?
Thanks!
Did some research and figured out that this happens only with binary transport (tcp)
@Override
public void deleteContext(ServerContext serverContext,
TProtocol input, TProtocol output) {
Metrics metrics = MetricsFactory.getInstance();
if (metrics != null) {
try {
metrics.decrementCounter(MetricsConstant.OPEN_CONNECTIONS);
} catch (Exception e) {
LOG.warn("Error Reporting JDO operation to Metrics system", e);
}
}
ThriftCLIServerContext context = (ThriftCLIServerContext) serverContext;
SessionHandle sessionHandle = context.getSessionHandle();
if (sessionHandle != null) {
LOG.info("Session disconnected without closing properly, close it now");
try {
cliService.closeSession(sessionHandle);
} catch (HiveSQLException e) {
LOG.warn("Failed to close session: " + e, e);
}
}
}
The above stub (from ThriftBinaryCLIService) gets executed through this piece of code from TThreadPoolServer which is used by ThriftBinaryCLIService.
eventHandler.deleteContext(connectionContext, inputProtocol, outputProtocol);
Apparently http transport (ThriftHttpCLIService) has a different strategy of cleaning up operation handles (not greedy like tcp)
Will check with hive community on this to understand a bit more and see if there is an issue addressing this already.