I have an R data frame, saved in Database02.Rda. Loading it
import rpy2.robjects as robjects
robjects.r.load("Database02.Rda")
works fine. However:
print(robjects.r.names("df"))
yields
NULL
Also, as an example, column 214 (213 if we count starting with 0) is named REGION.
print(robjects.r.table(robjects.r["df"][213]))
works fine:
Region 1 Region 2 ...
9811 3451 ...
but we should also be able to do
print(robjects.r.table("df$REGION"))
This, however, results in
df$REGION
1
(which it does also for column names that do not exist at all); also:
print(robjects.r.table(robjects.r["df"]["REGION"]))
gives an error:
TypeError: SexpVector indices must be integers, not str
Now, the docs say, names can not be used for subsetting in python. Am I correct to assume that the column names are not imported whith the rest of the data when loading the data frame with python/rpy2? Am I thus correct that the easiest way to access them is to save and load them as a seperate list and construct a dict or so in python mapping the names to the column index numbers? This does not seem very generic, however. Is there a way to extract the column names directly?
The versions of R, python, rpy2 I use are: R: 3.2.2 python: 3.5.0 rpy2: 2.7.8
When doing the following, you are loading whatever objects are Database02.Rda
into R's "global environment".
import rpy2.robjects as robjects
robjects.r.load("Database02.Rda")
robjects.globalenv
is an Environement. You can list its content with:
tuple(robjects.globalenv.keys())
Now I am understanding that one of your objects is called df
. You can access it with:
df = robjects.globalenv['df']
if df
is a list or a data frame, you can access its named elements with
rx2
(the doc is your friend here again). To get the one called REGION
, do:
df.rx2("REGION")
To list all named elements in a list or dataframe that's easy:
tuple(df.names)