Search code examples
pythonfilefor-loopparsingio

Python not finding files in loop beyond the first file


I have a script that contains a for loop over the files in a directory. Both the script and the directory are placed in the same directory on my machine. The directories name is passed as a parameter to the script.

I open each file(yml), read the contents to a dict and then call a method from my code with the contents of said dict of the file. The first loop(over the first file) runs without errors but in the second loop python throws "FileNotFoundError: [Errno 2] No such file or directory" in the line that opens the file.

config_dir = args.param # this is the directory which contains the yml files
config_file_names = os.listdir(config_dir)  # this is a list of said files

    for config_file_name in config_file_names:
        config_file_path = os.path.join(config_dir, config_file_name) # here the current file is concatenated with the directory to form a path from the working directory of script to the file
        with open(config_file_path) as f:  # LINE OF ERROR IN SECOND LOOP
            print(config_file_path)
            config = yaml.load(f, Loader=SafeLoader)
        --------------- MORE CODE ---------------

I thought this was a mistake in building the paths. However this did not change anything:

config_file_path = config_dir + "/" + config_file_name

If i add a continue after the code shown above, but before the rest of the code I omitted, the code runs without error:

parser = argparse.ArgumentParser()

parser.add_argument("-p", "--param", help="Path to parameter file directory", required=True)

args = parser.parse_args()

config_dir = args.param # this is the directory which contains the yml files
config_file_names = os.listdir(config_dir)  # this is a list of said files

    for config_file_name in config_file_names:
        config_file_path = os.path.join(config_dir, config_file_name) # here the current file is concatenated with the directory to form a path from the working directory of script to the file
        with open(config_file_path) as f:
            print(config_file_path)
            config = yaml.load(f, Loader=SafeLoader)
            continue # this fixes it
        --------------- MORE CODE ---------------

This produces the following consoleoutput:

C:\Users\user\mambaforge\envs\traintool\python.exe C:\Users\user\PycharmProjects\traintool\src\multiple_runs.py -p opt_configs 
2023-10-11 14:16:05.419007: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
1 Physical GPUs, 1 Logical GPUs
opt_configs\config_alpha_proteo_class.yml
2023-10-11 14:16:06.069136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5987 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5
opt_configs\config_amp_class.yml
opt_configs\config_biofilm_class.yml
opt_configs\config_cytotoxic_class.yml
opt_configs\config_proteo_class.yml
opt_configs\config_pseudo_biofilm_class.yml
opt_configs\config_p_aeruginosa_class.yml

Process finished with exit code 0

Don't mind the tensorflow specifics, these are always printed by this specific third party package.

This suggests the code hereafter is the cause. But according to pycharms Search for Usage function these are the only lines that read or write the relevant elements which are: config_dir, config_file_names, config_file_name, config_file_path

Nowhere in my entire project do I alter these elements. I don't know how anything could possibly affect the iteration over these files. I tried removing the first file. The error will occur at the third file(which then of course will be the second in that scenario). I can't reproduce the error outside of my program with example code.

The code of this project is spread over multiple scripts so sharing the entirety of this is not viable. I only work with the dictionaries contents created by the yaml loader. I hope that someone may have an idea how anything really could affect reading the files one after another. I tried this on two machines(Home PC and Servercluster) and there is no difference.

Python 3.9.13

edit: Here is the rest of the script file:

import argparse
import yaml
from yaml.loader import SafeLoader
import os

def main():
    parser = argparse.ArgumentParser()

    parser.add_argument("-p", "--param", help="Path to parameter file directory", required=True)

    args = parser.parse_args()

    config_dir = args.param
    config_file_names = os.listdir(config_dir)

    for config_file_name in config_file_names:
        config_file_path = os.path.join(config_dir, config_file_name)
        with open(config_file_path) as f:
            print(config_file_path)
            config = yaml.load(f, Loader=SafeLoader)
        continue

        assert config["modus"] == "train" or config["modus"] == "selection", (
            "Illegal Argument." "modus must be either train or selection."
        )

        if config["modus"] == "train":
            for model in config["model"]:
                for loss in config["loss"]:
                    for optimizer in config["optimizer"]:
                        train.train_seq(
                            trainfile=config["train_file"],
                            lr=config["learning_rate"],
                            epochs=config["epochs"],
                            batch_size=config["batch_size"],
                            loss=loss,
                            optimizer=optimizer,
                            model_name=model,
                            encoding=config["encoding"],
                            patience=config["patience"],
                            cv_split=config["cv_split"],
                            val_files=config["val_file"],
                            test_size=config["test_size"],
                            early_stopping=config["early_stopping"],
                            monitor=config["monitor"],
                            shuffle=config["shuffle"],
                            validation_split=config["validation_split"],
                            x_values=config["x_values"],
                            y_values=config["y_values"],
                            regression=config["regression"],
                            momentum=config["momentum"],
                            sep=config["sep"]
                        )
        elif config["modus"] == "selection":
            utils.find_best_model(config["path_to_models"], config["selection_criterion"])


if __name__ == "__main__":
    import utils
    utils.enable_gpu_mem_growth()
    import train
    main()

As far as I can tell all that is taken is from the files contents which are in the dict config.


Solution

  • Somewhere in the depths of the project the working directory was changed. Of course this is not visible when only posting the parts of the script above.

    I solved the problem by making this change:

    base_dir = os.getcwd()
    
    for config_file_name in config_file_names:
        os.chdir(base_dir)
    

    This resets the working directory to the original one before the loop. I found it by getting the working directory in the loop and showing it in the debugger. If you are experiencing something of the same, search your project for import os, os.chdir, or os.getcwd. These could lead you to the cause. IMO changing the working directory is bad practice, my problem being and example for why i think this.