Search code examples
pythonrroxygen2rpython

How do you import a Python library within an R package using rPython?


The basic question is this: Let's say I was writing R functions which called python via rPython, and I want to integrate this into a package. That's simple---it's irrelevant that the R function wraps around Python, and you proceed as usual. e.g.

# trivial example
# library(rPython)
add <- function(x, y) {
  python.assign("x", x)
  python.assign("y", y)
  python.exec("result = x+y")
  result <- python.get("result")
  return(result)
}

But what if the python code with R functions require users to import Python libraries first? e.g.

# python code, not R
import numpy as np
print(np.sin(np.deg2rad(90)))

# R function that call Python via rPython
# *this function will not run without first executing `import numpy as np`
print_sin <- function(degree){
   python.assign("degree", degree)
   python.exec('result = np.sin(np.deg2rad(degree))')
   result <- python.get('result')
   return(result)
}

If you run this without importing the library numpy, you will get an error.

How do you import a Python library in an R package? How do you comment it with roxygen2?

It appears the R standard is this:

# R function that call Python via rPython
# *this function will not run without first executing `import numpy as np`
print_sin <- function(degree){
   python.assign("degree", degree)
   python.exec('import numpy as np')
   python.exec('result = np.sin(np.deg2rad(degree))')
   result <- python.get('result')
   return(result)
}

Each time you run an R function, you will import an entire Python library.


Solution

  • As @Spacedman and @DirkEddelbuettel suggest you could add a .onLoad/.onAttach function to your package that calls python.exec to import the modules that will typically always be required by users of your package.

    You could also test whether the module has already been imported before importing it, but (a) that gets you into a bit of a regression problem because you need to import sys in order to perform the test, (b) the answers to that question suggest that at least in terms of performance, it shouldn't matter, e.g.

    If you want to optimize by not importing things twice, save yourself the hassle because Python already takes care of this.

    (although admittedly there is some quibblingdiscussion elsewhere on that page about possible scenarios where there could be a performance cost). But maybe your concern is stylistic rather than performance-oriented ...