Search code examples
rr-s3

Replacement functions in R that don't take input


This seems very related to several other questions that have been asked (this one for example), but I can't quite figure out how to do exactly what I want. Maybe replacement functions are the wrong tool for the job, which would also be a perfectly acceptable answer. I am much more familiar with Python than R and I can easily think of how I want to do it in Python but I can't quite get my head around how to approach it in R.

The problem: I am trying to modify an object in place within a function, without having to return it, but I don't need to pass in the value that modifies it, because this value is the result of a function call that's already contained in the object.

More specifically, I have a list (technically it's an s3 class, but I don't think that's actually relevant to this issue) that contains some things relating to a process started with processx::process$new() call. For reproducibility, here's a toy shell script you can run, and the code to get my res object:

echo '
echo $1
sleep 1s
echo "naw 1"
sleep 1s
echo "naw 2"
sleep 1s
echo "naw 3"
sleep 1s
echo "naw 4"
sleep 1s
echo "naw 5"
echo "All done."
' > naw.sh

Then my wrapper is something like this:

run_sh <- function(.args, ...) {
  p <- processx::process$new("sh", .args, ..., stdout = "|", stderr = "2>&1")
  return(list(process = p, orig_args = .args, output = NULL))
}

res <- run_sh(c("naw.sh", "hello"))

And res should look like

$process
PROCESS 'sh', running, pid 19882.

$output
NULL

$orig_args
[1] "naw.sh" "hello"  

So, the specific issue here is a bit peculiar to process$new but I think the general principle is relevant. I am trying to collect all the output from this process after it is finished, but you can only call process$new$read_all_output_lines() (or it's sibling functions) once because the first time it will return the result from the buffer and the subsequent times it returns nothing. Also, I am going to call a bunch of these and then come back to "check on them" so I can't just call res$process$read_all_output_lines() right away because then it will wait for the process to finish before the function returns, which is not what I want.

So I'm trying to store the output of that call in res$output and then just keep that and return it on subsequent calls. Soooo... I need to have a function to modify res in place with res$output <- res$process$read_all_output_lines().

Here's what I tried, based on guidance like this, but it didn't work.

get_output <- function(.res) {
  # check if process is still alive (as of now, can only get output from finished process)
  if (.res$process$is_alive()) {
    warning(paste0("Process ", .res$process$get_pid(), " is still running. You cannot read the output until it is finished."))
    invisible()
  } else {
    # if output has not been read from buffer, read it
    if (is.null(.res$output)) {
      output <- .res$process$read_all_output_lines()
      update_output(.res) <- output
    }
    # return output
    return(.res$output)
  }
}

`update_output<-` <- function(.res, ..., value) {
  .res$output <- value
  .res
}

Calling get_output(res) works the first time, but it does not store the output in res$output to be accessed later, so subsequent calls return nothing.

I also tried something like this:

`get_output2<-` <- function(.res, value) {
  # check if process is still alive (as of now, can only get output from finished process)
  if (.res$process$is_alive()) {
    warning(paste0("Process ", .res$process$get_pid(), " is still running. You cannot read the output until it is finished."))
    .res
  } else {
    # if output has not been read from buffer, read it
    if (is.null(.res$output)) {
      output <- .res$process$read_all_output_lines()
      update_output(.res) <- output
    }
    # return output
    print(value)
    .res
  }
}

Which just throws away the value but this feels silly because you have to call it with the assignment like get_output(res) <- "fake" which I hate.

Obviously I could also just return the modified res object, but I don't like that because then the user has to know to do res <- get_output(res) and if they forget to do that (the first time) then the output is lost to the ether and can never be recovered. Not good.

Any help is much appreciated!


Solution

  • After further information from the OP, it looks as if what is needed is a way to write to the existing variable in the environment that calls the function. This can be done with non-standard evaluation:

    check_result <- function(process_list) 
    { 
      # Capture the name of the passed object as a string
      list_name <- deparse(substitute(process_list))
    
      # Check the object exists in the calling environment
      if(!exists(list_name, envir = parent.frame()))
         stop("Object '", list_name, "' not found")
    
      # Create a local copy of the passed object in function scope
      copy_of_process_list <- get(list_name, envir = parent.frame())
    
      # If the process has completed, write its output to the copy
      # and assign the copy to the name of the object in the calling frame
      if(length(copy_of_process_list$process$get_exit_status()) > 0)
      {
        copy_of_process_list$output <- copy_of_process_list$process$read_all_output_lines()
        assign(list_name, copy_of_process_list, envir = parent.frame()) 
      }
      print(copy_of_process_list)
    }
    

    This will update res if the process has completed; otherwise it leaves it alone. In either case it prints out the current contents. If this is client-facing code you will want further type-checking logic on the object passed in.

    So I can do

    res <- run_sh(c("naw.sh", "hello"))
    

    and check the contents of res I have:

    res
    #> $`process`
    #> PROCESS 'sh', running, pid 1112.
    #> 
    #> $orig_args
    #> [1] "naw.sh" "hello" 
    #> 
    #> $output
    #> NULL
    

    and if I immediately run:

    check_result(res)
    #> $`process`
    #> PROCESS 'sh', running, pid 1112.
    #> 
    #> $orig_args
    #> [1] "naw.sh" "hello" 
    #> 
    #> $output
    #> NULL
    

    we can see that the process hasn't completed yet. However, if I wait a few seconds and call check_result again, I get:

    check_result(res)
    #> $`process`
    #> PROCESS 'sh', finished.
    #> 
    #> $orig_args
    #> [1] "naw.sh" "hello" 
    #> 
    #> $output
    #> [1] "hello"     "naw 1"     "naw 2"     "naw 3"     "naw 4"     "naw 5"    
    #> [7] "All done."
    

    and without explicitly writing to res, it has updated via the function:

    res
    #> $`process`
    #> PROCESS 'sh', finished.
    #> 
    #> $orig_args
    #> [1] "naw.sh" "hello" 
    #> 
    #> $output
    #> [1] "hello"     "naw 1"     "naw 2"     "naw 3"     "naw 4"     "naw 5"    
    #> [7] "All done."