When I need to work with I/O (Query DB, Call to the third API,...), I can use RichAsyncFunction. But I need to interact with Google Sheet via GG Sheet API: https://developers.google.com/sheets/api/quickstart/java. This API is sync. I wrote below code snippet:
public class SendGGSheetFunction extends RichAsyncFunction<Obj, String> {
@Override
public void asyncInvoke(Obj message, final ResultFuture<String> resultFuture) {
CompletableFuture.supplyAsync(() -> {
syncSendToGGSheet(message);
return "";
}).thenAccept((String result) -> {
resultFuture.complete(Collections.singleton(result));
});
}
}
But I found that message send to GGSheet very slow, It seems to send by synchronous.
Most of the code executed by users in AsyncIO
is sync originally. You just need to ensure, it's actually executed in a separate thread. Most commonly a (statically shared) ExecutorService
is used.
private class SendGGSheetFunction extends RichAsyncFunction<Obj, String> {
private transient ExecutorService executorService;
@Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
executorService = Executors.newFixedThreadPool(30);
}
@Override
public void close() throws Exception {
super.close();
executorService.shutdownNow();
}
@Override
public void asyncInvoke(final Obj message, final ResultFuture<String> resultFuture) {
executorService.submit(() -> {
try {
resultFuture.complete(syncSendToGGSheet(message));
} catch (SQLException e) {
resultFuture.completeExceptionally(e);
}
});
}
}
Here are some considerations on how to tune AsyncIO to increase throughput: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Async-IO-operator-tuning-micro-benchmarks-td35858.html