Search code examples
dagster

Is the working directory of the dagster main process different of the scheduler processes


I'm having an issue with the loading of a file from dagster code (setup, not pipelines). Say I have the following project structure:

pipelines
-app/
--environments
----schedules.yaml
--repository.py
--repository.yaml

When I run dagit while inside the project folder($cd project && dagit -y app/repository.yaml), this folder becomes the working dir and inside the repository.py I could load a file knowing the root is project

# repository.py

with open('app/evironments/schedules.yaml', 'r'):
   # do something with the file

However, if I set up a schedule the pipelines in the project do not run. Checking the cron logs it seems the open line throws a file not found exception. I was wondering if this happens because the working directory is different when executing the cron.

For context, I'm loading a config file with parameters of cron_schedules for each pipeline. Also, here's the tail of the stacktrace in my case:

  File "/home/user/.local/share/virtualenvs/pipelines-mfP13m0c/lib/python3.8/site-packages/dagster/core/definitions/handle.py", line 190, in from_yaml
    return LoaderEntrypoint.from_file_target(
  File "/home/user/.local/share/virtualenvs/pipelines-mfP13m0c/lib/python3.8/site-packages/dagster/core/definitions/handle.py", line 161, in from_file_target
    module = import_module_from_path(module_name, os.path.abspath(python_file))
  File "/home/user/.local/share/virtualenvs/pipelines-mfP13m0c/lib/python3.8/site-packages/dagster/seven/__init__.py", line 75, in import_module_from_path
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/user/pipelines/app/repository.py", line 28, in <module>
    schedule_builder = ScheduleBuilder(settings.CRON_PRESET, settings.ENV_DICT)
  File "/home/user/pipelines/app/schedules.py", line 12, in __init__
    self.cron_schedules = self._load_schedules_yaml()
  File "/home/user/pipelines/app/schedules.py", line 16, in _load_schedules_yaml
    with open(path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'app/environments/schedules.yaml'

Solution

  • You could open the file using the absolute path of the file so that it opens correctly.

    from dagster.utils import file_relative_path
    
    with open(file_relative_path(__file__, './environments/schedules.yaml'), 'r'):
       # do something with the file
    

    All file_relative_path is simply doing the following, so you can call the os.path methods directly if you prefer:

    def file_relative_path(dunderfile, relative_path):
        os.path.join(os.path.dirname(dunderfile), relative_path)