Search code examples
jmshdfskerberosspring-xd

springxd stream using HDFS-Dataset to save avro data unable to renew kerberos ticket


I have created a springxd stream ====> source-JMS queue -> Transform-Custom Java Processor (XML to AVRO) -> Sink - HDFS-Dataset.

Stream works perfectly fine but after 24 hours, since its continuous connection it is unable to renew the kerberos authentication ticket and stopped writing to HDFS. We are restarting the container where this stream deployed but still we face problems and losing the messages as they are not even sent to redis error queue.

I need help with -

  1. If we can renew the kerberos ticket for the stream. Do I need to update the sink code and need to create custom sink.
  2. I don't find any sink in springxd documentation similar to HDFS-Dataset and writes to local files system where I don't need to go through kerberos authentication.

Appreciate your help here.

Thanks,


Solution

  • This is a well known problem in spring xd which is not documented :). Something pretty similar happen to batch jobs which are deployed for long time and try to run later.. why? Because the hadoopConfiguration object is forcing the scope to singleton and it is getting instanced once you deploy your stream/job in spring-xd. In our case we created a listener for the spring batch jobs to renew the ticket before the jobs executions. You could do something similar in your streams, take this like guide

    https://github.com/spring-projects/spring-hadoop/blob/master/spring-hadoop-core/src/main/java/org/springframework/data/hadoop/configuration/ConfigurationFactoryBean.java

    I hope it helps.