I have a bunch of .RData time-series files and would like to load them directly into Python without first converting the files to some other extension (such as .csv). Any ideas on the best way to accomplish this?
People ask this sort of thing on the R-help and R-dev list and the usual answer is that the code is the documentation for the .RData
file format. So any other implementation in any other language is hard++.
I think the only reasonable way is to install RPy2 and use R's load
function from that, converting to appropriate python objects as you go. The .RData
file can contain structured objects as well as plain tables so watch out.
Linky: http://rpy.sourceforge.net/rpy2/doc-2.4/html/
Quicky:
>>> import rpy2.robjects as robjects
>>> robjects.r['load'](".RData")
objects are now loaded into the R workspace.
>>> robjects.r['y']
<FloatVector - Python:0x24c6560 / R:0xf1f0e0>
[0.763684, 0.086314, 0.617097, ..., 0.443631, 0.281865, 0.839317]
That's a simple scalar, d is a data frame, I can subset to get columns:
>>> robjects.r['d'][0]
<IntVector - Python:0x24c9248 / R:0xbbc6c0>
[ 1, 2, 3, ..., 8, 9, 10]
>>> robjects.r['d'][1]
<FloatVector - Python:0x24c93b0 / R:0xf1f230>
[0.975648, 0.597036, 0.254840, ..., 0.891975, 0.824879, 0.870136]