Search code examples
apache-kafkaapache-kafka-connects3-kafka-connector

ClassNotFound exception running Kafka Connect S3 Source Connector


I am evaluating Confluent Kafka S2 Source Connector and stuck with the issues with following stacktrace:

[2020-12-22 15:27:41,636] ERROR WorkerConnector{id=s3-source-connector} Error while starting connector (org.apache.kafka.connect.runtime.WorkerConnect
or)
org.apache.kafka.connect.errors.ConnectException: Failed to get list of folders from S3 bucket - kafka-connect for key path - topics/ and delimiter - /
        at io.confluent.connect.s3.source.S3Storage.listFolders(S3Storage.java:286)
        at io.confluent.connect.s3.source.S3Storage.getPartitions(S3Storage.java:98)
        at io.confluent.connect.storage.partitioner.TimeBasedPartitioner.getPartitions(TimeBasedPartitioner.java:50)
        at io.confluent.connect.cloud.storage.source.StorageSourceConnector.doStart(StorageSourceConnector.java:77)
        at io.confluent.connect.cloud.storage.source.StorageSourceConnector.start(StorageSourceConnector.java:69)
        at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:111)
        at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:136)
        at org.apache.kafka.connect.runtime.WorkerConnector.transitionTo(WorkerConnector.java:196)
        at org.apache.kafka.connect.runtime.Worker.startConnector(Worker.java:242)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startConnector(DistributedHerder.java:908)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1300(DistributedHerder.java:110)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$15.call(DistributedHerder.java:924)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$15.call(DistributedHerder.java:920)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.SdkClientException: Couldn't initialize a SAX driver to create an XMLReader
        at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:123)
        at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:127)
        at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:117)
        at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
        at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
        at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:69)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1714)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleSuccessResponse(AmazonHttpClient.java:1434)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1356)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1139)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:796)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:764)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:738)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:698)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:680)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:544)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:524)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5052)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4998)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4992)
        at com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:938)
        at io.confluent.connect.s3.source.S3Storage.listFolders(S3Storage.java:283)
        ... 16 more
Caused by: org.xml.sax.SAXException: SAX2 driver class org.apache.xerces.parsers.SAXParser not found
java.lang.ClassNotFoundException: org.apache.xerces.parsers.SAXParser
        at org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:230)
        at org.xml.sax.helpers.XMLReaderFactory.createXMLReader(XMLReaderFactory.java:191)
        at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:120)
        ... 37 more
Caused by: java.lang.ClassNotFoundException: org.apache.xerces.parsers.SAXParser
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:104)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.xml.sax.helpers.NewInstance.newInstance(NewInstance.java:82)
        at org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:228)
        ... 39 more

Connector config:

{
    "name": "source-connector",
    "config": {
      "connector.class":"io.confluent.connect.s3.source.S3SourceConnector",
      "s3.bucket.name":"bucket-test",
      "s3.region":"us-west-2",
      "tasks.max":"1",
      "topics":"migration-topic",
      "topics.dir":"topics/events",
      "format.class":"io.confluent.connect.s3.format.json.JsonFormat",
      "behavior.on.error": "log",
      "partitioner.class":"io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
      "path.format":"'date'=YYYY-MM-dd/'hour'=HH",
      "key.converter":"com.pandadoc.kafka.connect.msgpack.converter.MessagePackConverter",
      "key.converter.schemas.enable":"false",
      "value.converter":"com.pandadoc.kafka.connect.msgpack.converter.MessagePackConverter",
      "value.converter.schemas.enable":"false",
      "errors.tolerance": "all",
      "errors.deadletterqueue.topic.name": "kafka-connect-dead-letter-queue",
      "errors.deadletterqueue.context.headers.enable": true,
      "confluent.license":"",
      "confluent.topic.bootstrap.servers":"localhost:9092",
      "confluent.topic.replication.factor":"3"
    }
  }

Versions:

[2020-12-22 15:27:41,640] INFO Kafka version: 2.2.2-cp3 (org.apache.kafka.common.utils.AppInfoParser)
[2020-12-22 15:27:41,640] INFO Kafka commitId: 602b2e2e105b4d34 (org.apache.kafka.common.utils.AppInfoParser)  

It could be a JDK bug: https://bugs.openjdk.java.net/browse/JDK-8015099. It has been fixed in JDK 9+.

Confluent docker image confluentinc/cp-kafka-connect:5.2.4 uses JDK8:

openjdk version "1.8.0_172"
OpenJDK Runtime Environment (Zulu 8.30.0.1-linux64) (build 1.8.0_172-b01)
OpenJDK 64-Bit Server VM (Zulu 8.30.0.1-linux64) (build 25.172-b01, mixed mode)

Any other ideas on what could be wrong?


Solution

  • I've sorted the issue out 😅

    It turned out the JDK bug that caused the kind of behavior. There is an interoperability table for Kafka Connect version and Kafka here hence there are two options:

    1. Tweak docker Kafka Connect image by installing JDK9+
    2. Bump up Kafka Connect to 6.x (if Kafka version allows) that uses JDK11.