Search code examples
apache-flink

Process streams one by one and not in paralllel


Flink beginner - Need to process datastreams one after another and not in parallel

I have one datastream per file.

I need to maintain order of processing.But streams are all processed in parallel.Datasets did not help either. suggestions??


Solution

  • You can use org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction in Flink, which monitors files and forward file splits downstream in order of modification time; To achieve sequential file processing, you can make the parallelism of downstream to 1, which is a little tricky; Or you can implement your own custom SourceFunction, which forwards file content in your desired order.