Search code examples
pythonrrpy2lme4

Retain R dataframe index values when converting to a pandas dataframe


Fitted mixed effects model using R (base version 3.5.2) package LME4, run via rpy2 2.9.4 from Python 3.6

Able to print random effects as an indexed dataframe, where the index values are the values of the categorical variable(s) used to define the groups (using radon data):

import rpy2.robjects as ro
from rpy2.robjects import pandas2ri, default_converter
from rpy2.robjects.conversion import localconverter
from rpy2.robjects.packages import importr

lme4 = importr('lme4')

mod = lme4.lmer(**kwargs) # Omitting arguments for brevity
r_ranef = ro.r['ranef']
re = r_ranef(mod)
print(re[1])
                           Uppm   (Intercept)         floor   (Intercept)
AITKIN            -0.0026783361 -2.588735e-03  1.742426e-09 -0.0052003670
ANOKA             -0.0056688495 -6.418760e-03 -4.482764e-09 -0.0128942943
BECKER             0.0021906431  1.190746e-03  1.211201e-09  0.0023920238
BELTRAMI           0.0093246041  8.190172e-03  5.135196e-09  0.0164527872
BENTON             0.0018747838  1.049496e-03  1.746748e-09  0.0021082742
BIG STONE         -0.0073756824 -2.430404e-03  0.000000e+00 -0.0048823057
BLUE EARTH         0.0112939204  4.176931e-03  5.507525e-09  0.0083908075
BROWN              0.0069223055  2.544912e-03  4.911563e-11  0.0051123339

Converting this to a pandas DataFrame, the categorical values are lost from the index and replaced by integers:

pandas2ri.ri2py_dataframe(r_ranef[1])  # r_ranef is a dict of dataframes

    Uppm  (Intercept)         floor  (Intercept)
0  -0.002678    -0.002589  1.742426e-09    -0.005200
1  -0.005669    -0.006419 -4.482764e-09    -0.012894
2   0.002191     0.001191  1.211201e-09     0.002392
3   0.009325     0.008190  5.135196e-09     0.016453
4   0.001875     0.001049  1.746748e-09     0.002108
5  -0.007376    -0.002430  0.000000e+00    -0.004882
6   0.011294     0.004177  5.507525e-09     0.008391
7   0.006922     0.002545  4.911563e-11     0.005112

How do I retain the values of the original index?

The doc suggests as.data.frame could contain grp, which might be the values I'm after, but I'm struggling to implement that through rpy2; e.g.,

r_ranef = ro.r['ranef.as.data.frame']

does not work


Solution

  • Consider adding row.names as a new column in R data frame and then use this column to set_index in Pandas data frame:

    base = importr('base')
    
    # ADD NEW COLUMN TO R DATA FRAME
    re[1] = base.transform(re[1], index = base.row_names(re[1]))
    
    # SET INDEX IN PANDAS DATA FRAME
    py_df = (pandas2ri.ri2py_dataframe(re[1])
                         .set_index('index')
                         .rename_axis(None)
            )
    

    And to do so across all data frames in list, use R's lapply loop and then Python's list comprehension for new list of Pandas indexed data frames.

    base = importr('base')
    
    mod = lme4.lmer(**kwargs)          # Omitting arguments for brevity
    r_ranef = lme4.ranef(mod)
    
    # R LAPPLY
    new_r_ranef = base.lapply(r_ranef, lambda df: 
                              base.transform(df, index=base.row_names(df)))
    
    # PYTHON LIST COMPREHENSION
    py_df_list = [(pandas2ri.ri2py_dataframe(df)
                             .set_index('index')
                             .rename_axis(None)
                  ) for df in new_r_ranef]