Search code examples
arraysnumpypython-3.5decision-treerpy2

Non conformable array error when using rpart with rpy2


I'm using rpart with rpy2 (version 2.8.6) on python 3.5, and want to train a decision tree for classification. My code snippet looks like this:

import rpy2.robjects.packages as rpackages
from rpy2.robjects.packages import importr
from rpy2.robjects import numpy2ri
from rpy2.robjects import pandas2ri
from rpy2.robjects import DataFrame, Formula
rpart = importr('rpart')
numpy2ri.activate()
pandas2ri.activate()

dataf = DataFrame({'responsev': owner_train_label,
               'predictorv': owner_train_data})
formula = Formula('responsev ~.')
clf = rpart.rpart(formula = formula, data = dataf, method = "class", control=rpart.rpart_control(minsplit = 10, xval = 10))

where owner_train_label is a numpy float64 array of shape (12610,) and owner_train_data is a numpy float64 array of shape (12610,88)

This is the error I'm getting when I run the last line of code to fit the data.

RRuntimeError: Error in ((xmiss %*% rep(1, ncol(xmiss))) < ncol(xmiss)) & !ymiss : 
non-conformable arrays

I get that it is telling me they are non-conformable arrays but I don't know why as for the same training data, I can train using sklearn's Decision tree successfully. Thanks for your help.


Solution

  • I got around this by creating the dataframe using pandas and passing the panadas dataframe to rpart using rpy2's pandas2ri to convert it to R's dataframe.

    from rpy2.robjects.packages import importr
    from rpy2.robjects import pandas2ri
    from rpy2.robjects import Formula
    rpart = importr('rpart')
    pandas2ri.activate()
    
    df = pd.DataFrame(data = owner_train_data)
    df['l'] = owner_train_label
    formula = Formula('l ~.')
    clf = rpart.rpart(formula = formula, data = df, method = "class", control=rpart.rpart_control(minsplit = 10, xval = 10))