Search code examples
pythonrpy2

Save 2d numpy array to R file format using rpy2


This is a beginner's question but how do you save a 2d numpy array to a file in (compressed) R format using rpy2? To be clear, I want to save it in rpy2 and then later read it in using R. I would like to avoid csv as the amount of data will be large.


Solution

  • Looks like you want the save command. I would use the pandas R interface and do something like the following.

    import numpy as np
    from rpy2.robjects import r
    import pandas.rpy.common as com
    from pandas import DataFrame
    a = np.array([range(5), range(5)])
    df = DataFrame(a)
    df = com.convert_to_r_dataframe(df)
    r.assign("foo", df)
    r("save(foo, file='here.gzip', compress=TRUE)")
    

    There may be a more elegant way, though. I'm open to better suggestions. The above, in R would be used:

    > load("here.gzip")
    > foo
      X0 X1 X2 X3 X4
    0  0  1  2  3  4
    1  0  1  2  3  4
    

    You can bypass the use of pandas and use numpy2ri from rpy2. With something like:

    from rpy2.robjects import r
    from rpy2.robjects.numpy2ri import numpy2ri
    a = np.array([[i*2147483647**2 for i in range(5)], range(5)], dtype="uint64")
    a = np.array(a, dtype="float64") # <- convert to double precision numeric since R doesn't have unsigned ints
    ro = numpy2ri(a)
    r.assign("bar", ro)
    r("save(bar, file='another.gzip', compress=TRUE)")
    

    In R then:

    > load("another.gzip")
    > bar
         [,1]         [,2]         [,3]         [,4]         [,5]
    [1,]    0 4.611686e+18 9.223372e+18 1.383506e+19 1.844674e+19
    [2,]    0 1.000000e+00 2.000000e+00 3.000000e+00 4.000000e+00