Search code examples
pythonpython-packagingpython-importlib

Long lived resource access


I'm trying to package an application with some data in the structure. This application will perform a series of queries to DDBB and finally generate a script that will be launched to a batch system (SLURM). The script will be based on a template (part of the resources I'm accessing) and will refer to some helper scripts that will also be packed as resources:

./
  src/
      package/
              __init__.py
              code.py
              templates/
                        job.tmplt
              helpers/
                      helper1.sh
                      helper2.sh

According to Python manual, resources should be accessed like:

from importlib.resources import as_file, files

template_in = files("package.templates") / "job.tmplt"
helper_base = as_file(files("package.helpers"))

cmd = Path("job.cmd")
with template_in.open("r") as tmplt_in, cmd.open("w") as job_out:
    print(f"{helper_base.resolve()}/helper1.sh", file=job_out)

But that does not work. helper_base is not a real Path object and fails to return the real path name. I understand that this is because it could be inside a zip file and the importlib.resources is facilitating me the access to the file (and posterior cleanup, if needed). But my use case is different: I need just a reference to the file, and that reference be valid long after the script finishes (as it will be used by the batch system).

Is there any way to get the clean path to a resource that is not encapsulated in any container (like a zip) so it can be accessed after the script is finished?


Solution

  • As @sinoroc commented, there's no way to do that. The best approach is to have a "setup/installation" step where the data is copied from the package to a destination in disk outside the package (typically $XDG_DATA_HOME/$APPLICATION).

    The code I used for that is:

    from pathlib import Path
    from importlib.resources import files
    
    from platformdirs import user_data_path
    
    
    def _get_data_dir(application: str) -> Path:
        data_dir = user_data_path(application)
        if not data_dir.exists():
            data_dir.mkdir(parents=True)
    
        return data_dir
    
    
    def _populate_data_dir(data_dir: Path, package: str, force: bool = False) -> None:
        for file in files(package).iterdir():
            destination = data_dir / file.name
            if force or not destination.exists():
                with destination.open("w", encoding="UTF-8") as f_destination:
                    f_destination.write(file.read_text())
    
    
    _populate_data_dir(_get_data_dir("myapp"), "my_package")
    

    This code assumes that there are no folder in the package dir (otherwise, you will have to process them defferently) and that the files to be copied are small (as they are read entirely into memory).