Search code examples
rrpython

Python code integration in R: works while development but not after release


I am developing a R package on RStudio platform. I have many functions written in python which I want to call using R. For now I just implemented a simple reverse complement function for a DNA sequence. While writing and building the project on RStudio, the code works fine. Once I release it on github and install on other machine from the repo, it successfully installs, but when I call the function it fails to run. The details are shown below:

$ cat rev.py

def revcompl (s):
    rev_s = ''.join([{'M':'M', 'A':'T','C':'G','G':'C','T':'A'}[B] for B in s][::-1]).replace ("CM","MG")
    return rev_s

$ cat reverse_complement.R

#' Reverse complement for a given DNA sequence
#'
#' \code{reverse_complement} Reverse complement of a DNA sequence
#'
#' @usage reverse_complement (sequence)
#'
#' @param sequence A DNA sequence
#' @export
#'

reverse_complement <- function(sequence) {
    revFile = system.file("python", "rev.py", package = "rpytrial")
    print (revFile)
    python.load(revFile)
    rev_strand = python.call ("revcompl", sequence)
    return (rev_strand)
}

After doing install_github, when I run reverse_complement("AAAAA"), I get the following error:

 [1] ""
   File "<string>", line 3
     except Exception as e:_r_error = e.__str__()
          ^
 IndentationError: expected an indented block
 Error in python.exec(python.command) : name 'revcompl' is not defined
 In addition: Warning message:
 In file(con, "r") :
   file("") only supports open = "w+" and open = "w+b": using the former

From the error I can say that it is not finding the path. But is there a way to fix it?

Thanks, Satya


Solution

  • tl;dr move your python directory to inst/python.

    In order for additional files to be installed the package, and found by system.file(), you have to have them within the inst directory, or in one of the other specific directories mandated by the Writing R Extensions document.

    From the Non-R scripts in packages of Writing R Extensions:

    Subdirectory exec could be used for scripts for interpreters such as the shell, BUGS, JavaScript, Matlab, Perl, php (amap), Python or Tcl (Simile), or even R. However, it seems more common to use the inst directory, for example WriteXLS/inst/Perl, NMF/inst/m-files, RnavGraph/inst/tcl, RProtoBuf/inst/python and emdbook/inst/BUGS and gridSVG/inst/js.

    Also see the Package subdirectories section.

    So if you want your python code to be found via system.file("python", "rev.py", package = "rpytrial") it must be in inst/python/rev.py. I think putting it in exec/rev.py and using system.file("exec", "rev.py", package = "rpytrial") would also be an option (although I've never tried that approach).

    If you have strong feelings about having the python/ directory directly within the package's 'head' directory while developing, you could write a custom build script that (1) makes a copy of the whole package directory; (2) moves python to inst/python; and (3) builds the package from the copied directory (I can't see how to make this work via devtools::install_github(), though).