Search code examples
pythonpython-3.xrpy2databricksazure-databricks

How to install python package 'rpy2' in Databricks?


I am trying to install and utilize pymer4 package functionalities in Databricks which requires rpy2 to be installed as well. Under Library in the Cluster I am able to install them, it even says "Installed", but when I try to import them then it gives error, as if it was not installed properly. In local system it works perfectly.

from pymer4.test_install import test_install

Error is:


ImportError: No module named 'pandas.core.dtypes'
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<command-2946392196605768> in <module>()
----> 1 from pymer4.test_install import test_install

/databricks/python/lib/python3.5/site-packages/pymer4/__init__.py in <module>()
      6            "__version__"]
      7 
----> 8 from .models import Lmer, Lm
      9 from .simulate import (easy_multivariate_normal,
     10                        simulate_lm,

/databricks/python/lib/python3.5/site-packages/pymer4/models.py in <module>()
      2 import rpy2.robjects as robjects
      3 from rpy2.robjects.packages import importr
----> 4 from rpy2.robjects import pandas2ri
      5 import rpy2
      6 from copy import copy

/databricks/python/lib/python3.5/site-packages/rpy2/robjects/pandas2ri.py in <module>()
     14 from pandas.core.series import Series as PandasSeries
     15 from pandas.core.index import Index as PandasIndex
---> 16 from pandas.core.dtypes.api import is_datetime64_any_dtype
     17 import pandas
     18 import numpy

ImportError: No module named 'pandas.core.dtypes'

Solution

  • Was able to solve my own problem. It was nothing but version issues of dependent packages of pymer4 package. Changed the version of following packages with the latest one prior to release date of pymer4 and it worked:

    matplotlib==3.0.2
    pandas==0.23.4
    rpy2==2.9.4
    tzlocal