Use Proguard for Scala AWS Lambda

I have a question regarding the usage of proguard together with a scala aws lambda function. I have created a very simple aws lambda function like this:

package example

import scala.collection.JavaConverters._
import com.amazonaws.services.lambda.runtime.events.S3Event
import com.amazonaws.services.lambda.runtime.Context

object Main extends App {

def kinesisEventHandler(event: S3Event, context: Context): Unit = {
val result = event.getRecords.asScala.map(m => m.getS3.getObject.getKey)
println(result)
}

}

I have imported the following packages:

"com.amazonaws" % "aws-lambda-java-core" % "1.1.0"
"com.amazonaws" % "aws-lambda-java-events" % "1.3.0"

When I create a fat jar it is 13 MB in size and works like expected as an AWS Lambda function (only for test output).

13 MB is very big and so I tried proguard to shrink the jar, but it isn't working and I always get problems and after two days, I have no more ideas how to solve that.

Here is my proguard configuration:

-injars "/Users/x/x/x/AWS_Lambda/target/scala-2.12/lambda-demo-assembly-1.0.jar"
-libraryjars "/Users/x/x/x/AWS_Lambda/lib_managed/jars/org.scala-lang/scala-library/scala-library-2.12.1.jar"
-libraryjars "/Users/x/x/x/AWS_Lambda/lib_managed/jars/com.amazonaws/aws-lambda-java-core/aws-lambda-java-core-1.1.0.jar"
-libraryjars "/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/lib/rt.jar"
-libraryjars "/Users/x/x/x/AWS_Lambda/lib_managed/jars/com.amazonaws/aws-java-sdk-s3/aws-java-sdk-s3-1.11.0.jar"
-libraryjars "/Users/x/x/x/AWS_Lambda/lib_managed/jars/com.amazonaws/aws-lambda-java-events/aws-lambda-java-events-1.3.0.jar"
-outjars "/Users/x/x/x/AWS_Lambda/target/scala-2.12/proguard/lambda-demo_2.12-1.0.jar"
-dontoptimize
-dontobfuscate
-dontnote
-dontwarn

-keepattributes SourceFile,LineNumberTable

# Preserve all annotations.

-keepattributes *Annotation*

# Preserve all public applications.

-keepclasseswithmembers public class * {
    public static void main(java.lang.String[]);
}

# Preserve some classes and class members that are accessed by means of
# introspection.

-keep class * implements org.xml.sax.EntityResolver

-keepclassmembers class * {
    ** MODULE$;
}

-keepclassmembernames class scala.concurrent.forkjoin.ForkJoinPool {
    long eventCount;
    int  workerCounts;
    int  runControl;
    scala.concurrent.forkjoin.ForkJoinPool$WaitQueueNode syncStack;
    scala.concurrent.forkjoin.ForkJoinPool$WaitQueueNode spareStack;
}

-keepclassmembernames class scala.concurrent.forkjoin.ForkJoinWorkerThread {
    int base;
    int sp;
    int runState;
}

-keepclassmembernames class scala.concurrent.forkjoin.ForkJoinTask {
    int status;
}

-keepclassmembernames class scala.concurrent.forkjoin.LinkedTransferQueue {
    scala.concurrent.forkjoin.LinkedTransferQueue$PaddedAtomicReference head;
    scala.concurrent.forkjoin.LinkedTransferQueue$PaddedAtomicReference tail;
    scala.concurrent.forkjoin.LinkedTransferQueue$PaddedAtomicReference cleanMe;
}

# Preserve some classes and class members that are accessed by means of
# introspection in the Scala compiler library, if it is processed as well.

#-keep class * implements jline.Completor
#-keep class * implements jline.Terminal

#-keep class scala.tools.nsc.Global

#-keepclasseswithmembers class * {
#    <init>(scala.tools.nsc.Global);
#}

#-keepclassmembers class * {
#    *** scala_repl_value();
#    *** scala_repl_result();
#}

# Preserve all native method names and the names of their classes.

-keepclasseswithmembernames,includedescriptorclasses class * {
    native <methods>;
}

# Preserve the special static methods that are required in all     enumeration
# classes.

-keepclassmembers,allowoptimization enum * {
    public static **[] values();
    public static ** valueOf(java.lang.String);
}

# Explicitly preserve all serialization members. The Serializable interface
# is only a marker interface, so it wouldn't save them.
# You can comment this out if your application doesn't use serialization.
# If your code contains serializable classes that have to be backward
# compatible, please refer to the manual.

-keepclassmembers class * implements java.io.Serializable {
    static final long serialVersionUID;
    static final java.io.ObjectStreamField[] serialPersistentFields;
    private void writeObject(java.io.ObjectOutputStream);
    private void readObject(java.io.ObjectInputStream);
    java.lang.Object writeReplace();
    java.lang.Object readResolve();
}

# Your application may contain more items that need to be preserved;
# typically classes that are dynamically created using Class.forName:

# -keep public class mypackage.MyClass
# -keep public interface mypackage.MyInterface
# -keep public class * implements mypackage.MyInterface

-keep,includedescriptorclasses class example.** { *; }

-keepclassmembers class * {
    <init>(...);
}

When I run this my jar is very small (around 5 MB), but when I launch the lambda I get the following error

"errorMessage": "java.lang.NoSuchMethodException: com.amazonaws.services.s3.event.S3EventNotification.parseJson(java.lang.String)",
"errorType": "lambdainternal.util.ReflectUtil$ReflectException"

I had a look at the class and proguard deleted this function. When I changed the config to also keep this file, I get another problem in another file.

Does somebody has already used proguard with a scala AWS lambda function and has a good setting or knows about this problem? Is there any other good solution to shrink the jar size?

Best, Lothium

Solution

Honestly, 13MB isn't that big. But, as much as I'm sure that this is going to be considered heresy to a Scala developer, I created an equivalent method in Java and it's a bit over 7MB. I didn't try to use Proguard on it - it may shrink further.

That was with the S3Event package as you're using. If you look at what gets included because of that package it brings in tons of extra stuff - SQS, SNS, Dynamo and so on. Ultimately that is the biggest part. I did a little test to try to eliminate all libraries except for aws-lambda-java-core and instead used JsonPath. That got my jar file to 458K.

My code is below. I know it's not Scala but perhaps you can get some ideas from it. The key was eliminating as many AWS libraries as possible. Of course, if you want to do anything more than print keys in your Lambda you'll need to bring in more AWS libraries which, again, makes the size about 7MB.

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.List;

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestStreamHandler;
import com.jayway.jsonpath.JsonPath;


public class S3EventLambdaHandler implements RequestStreamHandler {
    public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) {

        try {
            List<String> keys = JsonPath.read(inputStream, "$.Records[*].s3.object.key");

            for( String nextKey: keys )
                System.out.println(nextKey);
        }
        catch( IOException ioe ) {
            context.getLogger().log("caught IOException reading input stream");
        }
    }
}