i've built the simplest pipeline using beam yaml (python sdk) where a csv file is read and should be printed to log. while running with default DirectRunner:
python -m apache_beam.yaml.main --pipeline_spec_file=pipeline-01.yaml
everything works fine and i do see the output, however when using FlinkRunner:
python -m apache_beam.yaml.main --pipeline_spec_file=pipeline-01.yaml --runner=FlinkRunner --flink_version=1.16 --flink_master=localhost:8081 --environment_type=EXTERNAL --environment_config=localhost:50000
no logs are printed, even though i can see through the Flink Dashboard that the run succeeded.
my pipeline:
pipeline:
type: chain
transforms:
- type: ReadFromCsv
config:
path: data/input2.csv
- type: LogForTesting
the path is to a file stored locally on my computer.
can anyone clarify? Thanks
The answer to this was quite embarrassing..
my pipeline was looking at a file saved locally, but i forgot to copy it into flink cluster (so basically, there were no logs because the file was "empty" when running with FlinkRunner. Once file was copied it worked great :)