Search code examples
rdrake-r-package

Is there a way to change the absolute path of file_in inputs to drake without invalidating downstream targets?


For my project, sometimes restructuring, or simply changing the mount point of my project data directory is required (Eg - Upgrading to catalina and no longer being able to have non-standard subdirectories of / ) .

I've noticed that, even though the contents of the input directories don't change, changing the path prefix to the common components will invalidate all targets.

Is there a way to avoid this?


Solution

  • My main recommendation here is to use relative paths instead of absolute paths. If you have ever used the here package, it is the same idea. But instead of writing file.path(here::here(), "path/to/file.txt"), I recommend writing file_in("path/to/file.txt") in the plan, assuming you intend to call drake::make() when your working directory is path.

    That's for future reference. In your current situation right now, if you are absolutely sure all the files are up to date and you don't want to spend time rebuilding targets, then you can use make(plan, trigger = trigger(command = FALSE, file = FALSE) to tell drake to stop worrying about whether commands or files change. (Why commands? Because that's where the file_in() calls will be, and I assume you are changing the paths inside.)

    Edit

    I realize now that I did not fully understand your question the first time. But since I also work with data in a similar way as you, I think there is an answer. Say you have a plan like this:

    plan <- drake_plan(
      data = get_data(file_in("DRIVE_NAME/file.db"))
    )
    

    And your mount point changes, making it look like this:

    plan <- drake_plan(
      data = get_data(file_in("DIFFERENT_MOUNT_POINT/file.db"))
    )
    

    As you noted, the struggle comes from that changing path. What you can do here manually track the file using the "change" trigger. That way, we don't need file_in(). Second, use ignore() around the changing path so drake thinks the command stays the same. No superfluous invalidation when you change mount points.

    plan <- drake_plan(
      data = target(
        get_data(ignore("WHATEVER_MOUNT_POINT/file.db")),
        trigger = trigger(change = file.mtime("WHATEVER_MOUNT_POINT/file.db"))
      ) 
    )
    

    Now, whenever the modification time changes, the data gets invalidated. But you can change WHATEVER_MOUNT_POINT without incurring invalidation. I would ordinarily choose a file hash for the trigger (that's what file_in() tells drake to do as a last result) but I chose the time stamp for you because file.mtime() is fast, your data is large, and it hardly ever changes.