Search code examples
python-3.xrpy2

Python: How to import arules package from R using Rpy2


I am trying to use python with some nice functions in R. In particular I want to use read.transactions function which is found in one of the packages in R (arules)

I did the following steps

1- Open Anaconda and lunch R studio

In R studio

2- install.packages('arules', dep = TRUE) 3- loadNamespace('arules')

4- .libPaths()

Got

[1] "D:/Anaconda3/Lib/site-packages/rpy2/R/win-library/3.4"
[2] "C:/Program Files/R/R-3.4.4/library" 

Now I go to jupyter notebook

In Jupyter Notebook

import rpy2
import rpy2.robjects as RObjects
from rpy2.robjects.packages import importr
utils = importr("utils")


d = {'print.me': 'print_dot_me', 'print_me': 'print_uscore_me'}
try:
    arules = importr('arules', robject_translations = d, lib_loc = "D:/Anaconda3/Lib/site-packages/rpy2/R/win-library/3.4")
except:
    arules = importr('arules', robject_translations = d, lib_loc = "C:/Program Files/R/R-3.4.4/library")

The Outcome was

---------------------------------------------------------------------------
RRuntimeError                             Traceback (most recent call last)
<ipython-input-3-5df30d28440c> in <module>()
      3 try:
----> 4     arules = importr('arules', robject_translations = d, lib_loc = "D:/Anaconda3/Lib/site-packages/rpy2/R/win-library/3.4")
      5 except:

~\Anaconda3\lib\site-packages\rpy2\robjects\packages.py in importr(name, lib_loc, robject_translations, signature_translation, suppress_messages, on_conflict, symbol_r2python, symbol_check_after, data)
    452                               _system_file(package = rname)):
--> 453         env = _get_namespace(rname)
    454         version = _get_namespace_version(rname)[0]

RRuntimeError: Error in loadNamespace(name) : there is no package called 'arules'


During handling of the above exception, another exception occurred:

RRuntimeError                             Traceback (most recent call last)
<ipython-input-3-5df30d28440c> in <module>()
      4     arules = importr('arules', robject_translations = d, lib_loc = "D:/Anaconda3/Lib/site-packages/rpy2/R/win-library/3.4")
      5 except:
----> 6     arules = importr('arules', robject_translations = d, lib_loc = "C:/Program Files/R/R-3.4.4/library")
      7 

~\Anaconda3\lib\site-packages\rpy2\robjects\packages.py in importr(name, lib_loc, robject_translations, signature_translation, suppress_messages, on_conflict, symbol_r2python, symbol_check_after, data)
    451     if _package_has_namespace(rname, 
    452                               _system_file(package = rname)):
--> 453         env = _get_namespace(rname)
    454         version = _get_namespace_version(rname)[0]
    455         exported_names = set(_get_namespace_exports(rname))

RRuntimeError: Error in loadNamespace(name) : there is no package called 'arules'

Which was not able to import the R package to Python

I did the same with DirichletReg and it was successful. I do not know why.

Can anyone help me with this?


Solution

  • Now to the last of the discovery, there is nothing like that in python, however, there is a way out to use read.transactions

    groceries <- read.transactions("groceries.csv", sep = ",")
    > summary(groceries)
    transactions as itemMatrix in sparse format with
    9835 rows (elements/itemsets/transactions) and
    169 columns (items) and a density of 0.02609146
    

    Python Jupyter notebook

    1) Import the data as

    import requests
    url = 'https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/groceries.csv'
    grocery_dataset = requests.get(url)
    # Save string as txt file
    f = open('grocery_dataset.txt','w')
    f.write(grocery_dataset.text)
    f.close()
    

    2) Separate the data and adjust them as you wish

    import csv
    grocery_items = set()
    with open("grocery_dataset.txt") as f:
        reader = csv.reader(f, delimiter=",")
        for i, line in enumerate(reader):
            grocery_items.update(line)
    output_list = list()
    with open("grocery_dataset.txt") as f:
        reader = csv.reader(f, delimiter=",")
        for i, line in enumerate(reader):
            row_val = {item:0 for item in grocery_items}
            row_val.update({item:1 for item in line})
            output_list.append(row_val)
    

    4) save it as a Dataframe in python

    import pandas as pd
    grocery_df = pd.DataFrame(output_list)
    

    hence

    grocery_df.shape
    

    will give

    (9835, 169)
    

    which represent that of the rows and columns of the summary(groceries) in R