Search code examples
scalaapache-kafkaavro

java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to java.lang.String


i am executing kafka consumer program to read avro formatted data from topics. after pooling the generic records , i am iterating over generic records and getting generic record.value(). i want to convert the value to string but failing.

 def getProp():Properties = {

    val props = new Properties()
    props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, servers)
    props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, serializer)
    props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, deserializer)
    props.put(ConsumerConfig.GROUP_ID_CONFIG, groupid)
    props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, autoOffset)
    //props.put("specific.avro.reader", specificAvroReader)
    props.put("schema.registry.url", schemaRegestry)
    props.put("consumer-timeout-ms", "30000")
    props
  }

def consume(props: Properties, spark: SparkSession) = {
    val conSumer = new KafkaConsumer[String, String](props)
    conSumer.subscribe(util.Collections.singletonList(topic))
    while (true) {
      val records: ConsumerRecords[String,String] = conSumer.poll(100)
      for(record <- records.asScala){
        val m:String = record.value() ```

error:-
20/02/01 05:04:55 INFO internals.ConsumerCoordinator: Setting newly assigned partitions [xx1, xx2, xx3, xx4] for group xxxxxx
Exception in thread "main" java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to java.lang.String
        at com.hbc.IntellicheckConsumer$$anonfun$consume$1.apply(TestScala.scala:52)
        at com.hbc.IntellicheckConsumer$$anonfun$consume$1.apply(TestScala.scala:49)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at com.hbc.IntellicheckConsumer.consume(TestScala.scala:49)
        at com.hbc.TestScala$.main(TestScala.scala:93)
        at com.hbc.TestScala.main(TestScala.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/02/01 05:04:55 INFO spark.SparkContext: Invoking stop() from shutdown hook
20/02/01 05:04:55 INFO server.AbstractConnector: Stopped Spark@33aecef7{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}
20/02/01 05:04:55 INFO ui.SparkUI: Stopped Spark web UI at http://172.1


Solution

  • You're telling your consumer that you want strings

    new KafkaConsumer[String, String](props)
    ConsumerRecords[String,String]
    

    Instead, you probably want

    new KafkaConsumer[String, GenericRecord](props)
    ConsumerRecords[String,GenericRecord]
    

    is there any way to read those avro recors in a spark dataframe for further processing

    Well, you'd have to rewrite all your code to actually use Spark Structured Streaming

    You don't need spark-submit only to run Scala code

    the value i receive is in the form of nested json

    Not sure why you're using Avro if you're just putting JSON strings into fields anyway