Search code examples
spring-cloud-dataflow

contentType getting prefixed to data written from HDFS sink


I am using HDFS sink and writing to HDFS. But the payload I write to HDFS is prefixed with ? contentType "text/plain" though this in not in the payload. Please let me know why this is getting prefixed and how to remove it.

stream create --definition ":streaming --spring.cloud.stream.bindings.output.producer.headerMode=raw > myprocessor --spring.cloud.stream.bindings.output.content-type=text/plain --spring.cloud.stream.bindings.input.consumer.headerMode=raw|hdfs --spring.hadoop.fsUri=hdfs://127.0.0.1:50071 --hdfs.directory=/ws/sparkoutput --hdfs.file-name=sparkstream --hdfs.enable-sync=true --hdfs.flush-timeout=10000 --spring.cloud.stream.bindings.input.consumer.headerMode=raw --spring.cloud.stream.bindings.input.content-type=text/plain" --name sparkstream


Solution

  • If you are assuming that header mode for the hdfs input is raw then you should make the output of myprocessor raw as well - i.e.

     myprocessor --spring.cloud.stream.bindings.output.content-type=text/plain --spring.cloud.stream.bindings.input.consumer.headerMode=raw --spring.cloud.stream.bindings.output.producer.headerMode=raw
    

    Or alternatively you should remove the header settings on hdfs (since the sink will just process the payload then).