playframework playframework-2.0 amazon-dynamodb

DynamoDB async access from Play Framework

I need to access and write to DynamoDB from a Play Framework application. There are already a couple of questions regarding this topic (here, here and here; however all of them are at least 3 years old).

The answer for the question is usually to use a wrapper (AWScala by seratch) or a library dedicated for Play:

However the wrapper simply calls the sync versions of the SDK underneath the hood. And if possible I would like to be able to update the AWS SDK as soon as a new version comes out and not be dependent on the used Scala/Play library to be updated first. So the best alternative for me turns out to be the aws-scala-sdk wrapper generator by awslabs. The async wrapper uses for example the Future<PutItemResult> putItemAsync(PutItemRequest putItemRequest, AsyncHandler<PutItemRequest,PutItemResult> asyncHandler) method which still returns a Java Future, but it's also possible to use the callbacks of the AsyncHandler to drive the response of a Scala Future:

val promise = scala.concurrent.Promise[PutItemResult]
dynamoDBAsync(request, new com.amazonaws.handlers.AsyncHandler[PutItemRequest, PutItemResult]() {
  override def onSuccess(request: PutItemRequest, result: PutItemResult) = promise.success(Ok)
  override def onError(exception: Exception) = promise.failure(exception)
})
promise.future

Code like this is generated by the aws-scala-sdk generator. Is this approach safe to use with Play and the default ExecutionContext, or does it still suffer from the same problem of blocking a thread like calling Java's Future.get()?

Solution

After spending a lot of time doing stress tests on DynamoDB with Play I am fairly optimistic to use com.amazonaws.handlers.AsyncHandler.

Test setup: one server instance (type varying) with a couple of dedicated requester instances (m4.large). Each request contains a JSON payload which is written into the DynamoDB (about 200 Bytes). Each requester instance starts several threads which handle the actual requests. The requests are spread evenly over a period of time and each requester instance ramps its threads up in a staggered manner so that no requests are throttled by DynamoDB (table with 10000 provisioned write capacity units, no requests were ever throttled according to CloudWatch). The ulimit -n (number of open files) was increased to 20000 on the server instance because otherwise the server would behave weird (process didn't listen on port 9000 anymore, but some threads were still able to dispatch requests while others were not) above about 3500-4000 requester threads. Max Java heap size was 8GB.

My findings are as follows:

m4.large server instance: Below 2500-2700 requests/s response times are <60ms on average. Write throughput caps out at about 3700-3800 requests/s. Bottleneck is CPU, load constantly at 100%.
m4.4xlarge server instance: No matter how many request threads, response times constantly <50ms, average at about 10ms or lower. I was able to max out the 10000 provisioned write capacity units...