I know I can use fan in and fan out notation in Streams DSL to cascade multiple flow modifiers in the same stream, like:
s3 > :data
ftp > :data
http > :data
:data > file
However, supposing I have stream1 and stream2 registered, each one with their own DSL, how can I cascade both, doing something similar?
For instance:
time | transform | file
file | filter | http
Then I want to create something like stream3:
stream1 | stream2 | s3
Meaning both time
and file
would be read, and results would be written to file
, http
and s3
. Is it possible?
I am assuming besides writing to the sinks I specified, stream1
and stream2
would also be writing back to pipe, so I would be able to cascade them and call them from stream3
.
EDIT - Problem I am trying to solve.
As asked in the comments, let me detail what kind of problem I am trying to solve here.
The organization where I work has complex flows, 1 team doesn't maintain the whole thing, there are data producers teams, which take care of generating data from sources, and until the final data consumer there are usually many teams inferring data, transforming, normalizing, etc.
stream1
, in my case, would be maintained by 1 team in the company, steam2
by other team and stream3
by my platform.
Although the teams maintaining each flow are independent, technically I still want the benefits from the in-memory pipelining when aggregating different flows. Of course, I expect to have to manage myself problems like - team 1 releases a new version of stream1
that will require a new version of stream2
- that's fine for my use case - my team could take care of that.
There are a few approaches to solve this.
Option 1: Combine the responsibilities of processor + sink into a "processor"; if that's done, you'd be able to consume the ingest (or) ingest + processing (or) ingest + processing + write as separate fragments of a streaming pipeline, and be able to use them in the DSL or in the GUI. The drawback of the composition here, of course, is the inability to independently interact and upgrade business logic, which might not be a problem for a few use-cases. In other words, you don't necessarily need them as individual microservices - it is perfectly fine if the responsibilities are thin, to compose them in a single App.
Option 2: Don't use SCDF. Build a similar set of Apps as discussed previously and orchestrate them as standalone Apps. You won't have to deal with the source, processor, or sink contracts in SCDF, so you will have the flexibility to plug the Apps in any manner you wish. The drawback here is the manual orchestration. The data pipeline visibility for monitoring is another thing. See example here. If you still want to use it in SCDF, though it may not be ideal, there's a workaround - see here.
Option 3: You can also build an integration-flow using Spring Integration, and build that into an SCSt App. This could give more granular controls, but you'd have to orchestrate them individually like in Option-2. Here's an example.