Search code examples
pythonrrpy2

Is there a way to access R data frame column names in python/rpy2?


I have an R data frame, saved in Database02.Rda. Loading it

import rpy2.robjects as robjects
robjects.r.load("Database02.Rda")

works fine. However:

print(robjects.r.names("df"))

yields

NULL

Also, as an example, column 214 (213 if we count starting with 0) is named REGION.

print(robjects.r.table(robjects.r["df"][213]))

works fine:

Region 1   Region 2   ...
    9811       3451   ...

but we should also be able to do

print(robjects.r.table("df$REGION"))

This, however, results in

df$REGION 
        1

(which it does also for column names that do not exist at all); also:

print(robjects.r.table(robjects.r["df"]["REGION"]))

gives an error:

TypeError: SexpVector indices must be integers, not str

Now, the docs say, names can not be used for subsetting in python. Am I correct to assume that the column names are not imported whith the rest of the data when loading the data frame with python/rpy2? Am I thus correct that the easiest way to access them is to save and load them as a seperate list and construct a dict or so in python mapping the names to the column index numbers? This does not seem very generic, however. Is there a way to extract the column names directly?

The versions of R, python, rpy2 I use are: R: 3.2.2 python: 3.5.0 rpy2: 2.7.8


Solution

  • When doing the following, you are loading whatever objects are Database02.Rda into R's "global environment".

    import rpy2.robjects as robjects
    robjects.r.load("Database02.Rda")
    

    robjects.globalenv is an Environement. You can list its content with:

    tuple(robjects.globalenv.keys())
    

    Now I am understanding that one of your objects is called df. You can access it with:

    df = robjects.globalenv['df']
    

    if df is a list or a data frame, you can access its named elements with rx2 (the doc is your friend here again). To get the one called REGION, do:

    df.rx2("REGION")
    

    To list all named elements in a list or dataframe that's easy:

    tuple(df.names)