Search code examples
xsltapache-kafkapublish-subscribemiddlewaresaxon

XSLT based transformation "service" on top of Apache Kafka


At the moment I am writing this question, there are not (yet) any questions tagged with both [apache-kafka] and [xslt].

I am a "classic" Message oriented middleware (BizTalk, TIBCO, ...) guy who is just discovering Kafka and its IMPRESSIVE performance figures!

And, then, I am wondering what the recommendation from the "Kafka-community" about how to transform the message payload between its publishing and its consumption...

Indeed, in my world of integration, the data structure (i.e. format) exposed by the producer is usually radically different from the data structure (format) expected by the consumer. As an example, I may have, as a producer, a mainframe application formatting data in a COBOL copybook structure while my front-end application wants to consume a modern JSON format.

[Update following the 1st answer from @morganw09dev]

I like the proposal from @morganw09dev, but I am a bit "annoyed" by the creation of consumer-specific topics. I see the "Topic B" (see @morganw09dev's 1st answer) as the topic specific for my front-end application in order for consuming information from the "Topic A". In order words, this specificity makes the "Topic B" a queue ;-) It is fine, but I am wondering if such a design would not "hurt" a Kafka-native ;-)

From my preliminary readings on Kafka, it is clear that I should also learn more about Storm... but, then, I have discovered Flink that, according to the graph at https://flink.apache.org/features.html, looks MUCH more performant than Storm, and now @morganw09dev has mentioned Samza! That means that I don't know where to start ;-)

Ultimately, I would like to code my transformations in XSLT, and, in the Java world, I think that Saxon is one of the leading XSLT processor. Do you know any "integration" of Saxon with Storm, Flink or Samza? Or, maybe my question does not make sense and I have to find another "way" to use Saxon with Kafka.

At the moment I am writing this comment, there are not (yet) any questions tagged with both [saxon] and any of the [apache-kafka], [apache-storm], [apache-flink] and/or [apache-samza].


Solution

  • Kafka itself cannot be used to transform data. It's only used for storing data to be consumed later.

    One thought is having a three part architecture.

    Kafka Topic A => Transformer => Kafka Topic B
    

    Per your example. Your producer pushes COBOL related data to Kafka Topic A. Your Transformer reads from Topic A, does the necessary transformations and then outputs JSON to Topic B. Once in Topic B the front end application can then read it in its preferred format. If you go that route, the Transformer could be custom built using Kafka's default consumer and producer, or use a streaming framework such as Apache Samza or Apache Storm to help handle the messaging. Both Samza and Kafka were initially developed at LinkedIn and I believe work fairly naturally together. (Though I have never tried Samza).