Search code examples
apache-sparkstreamingspark-structured-streamingapache-hudi

Spark structured streaming with Apache Hudi


I have a requirement where i need to write the stream using structured streaming to Hudi dataset. I found there is a provision to do this over Apache Hudi Jira issues but wanted to know if anyone successfully implemented this and have an example. I am trying to structure stream the data from AWS Kinesis Firehose to Apache Hudi using spark structured streaming

Quick help is appreciated.


Solution

  • I know of atleast one user using structure streaming sink in Hudi. https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/test/scala/DataSourceTest.scala#L190 could help?