Search code examples
apache-flink

Unable to run a python flink application on cluster


I am trying to run a Python Flink Application on the standalone Flink cluster. The application works fine on a single node cluster but it throws the following error on a multi-node cluster. java.lang.Exception: The user defined 'open()' method caused an exception: An error occurred while copying the file. Please help me resolve this problem. Thank you

The application I am trying to execute has the following code.

from flink.plan.Environment import get_environment
from flink.plan.Constants import INT, STRING, WriteMode

env = get_environment()

data = env.from_elements("Hello")

data.map(lambda x: list(x)).output()
env.execute()

Solution

  • You have to configure "python.dc.tmp.dir" in "flink-conf.yaml" to point to a distributed filesystem (like HDFS). This directory is used to distributed the python scripts.