Search code examples
pythonrrpy2

Cubist regression under rpy2: "subscript out of bounds" error


When I use rpy2 to do the Cubist regression.I met the error:

Error in strsplit(tmp, "\"")[[1]] : subscript out of bounds

I try to Use as.matrix to change the data format,but it's still unwork.

import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
from rpy2.robjects.vectors import FloatVector
from rpy2.robjects import pandas2ri
Cubist = importr('Cubist')
lattice = importr('lattice')
r = robjects.r
# 准备样点数据
dt = r('mtcars')
Z = FloatVector(dt[3])
X = FloatVector(dt[5])
X1 = FloatVector(dt[6])
T = r['cbind'](X,X1)

regr = r['cubist'](x=T,y=Z,committees=10)

Solution

  • If a matrix, the x argument to cubist() seems to require a dimnames attribute.

    Setup in R:

    library(Cubist)
    
    dt = mtcars
    Z = dt[, 4]
    X = dt[, 6]
    X1 = dt[, 7]
    

    Now compare this (reproduces your error):

    > T = cbind(dt[, 6], dt[, 7])
    > str(T)
     num [1:32, 1:2] 2.62 2.88 2.32 3.21 3.44 ...
    > cubist(x=T, y=Z, committees=10)
    cubist code called exit with value 1
    Error in strsplit(tmp, "\"")[[1]] : subscript out of bounds
    

    vs.

    > T = cbind(X, X1)
    > str(T)
     num [1:32, 1:2] 2.62 2.88 2.32 3.21 3.44 ...
     - attr(*, "dimnames")=List of 2
      ..$ : NULL
      ..$ : chr [1:2] "X" "X1"
    > cubist(x=T, y=Z, committees=10)
    
    Call:
    cubist.default(x = T, y = Z, committees = 10)
    
    Number of samples: 32
    Number of predictors: 2
    
    Number of committees: 10
    Number of rules per committee: 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
    

    There are multiple ways to ensure the dimnames get attached via rpy2. One easy way with your code is simply to explicitly name the variables:

    In [15]: T = r['cbind'](X=X,X1=X1)
    
    In [16]: print(r['str'](T))
     num [1:32, 1:2] 2.62 2.88 2.32 3.21 3.44 ...
     - attr(*, "dimnames")=List of 2
      ..$ : NULL
      ..$ : chr [1:2] "X" "X1"
    <rpy2.rinterface.NULLType object at 0x7f0d7c0f5608> [RTYPES.NILSXP]
    
    In [17]: print(r['cubist'](x=T,y=Z,committees=10))
    
    Call:
    cubist.default(x = structure(c(2.62, 2.875, 2.32, 3.215, 3.44, 3.46,
     205, 215, 230, 66, 52, 65, 97, 150, 150, 245, 175, 66, 91, 113, 264, 175,
     335, 109), committees = 10L)
    
    Number of samples: 32
    Number of predictors: 2
    
    Number of committees: 10
    Number of rules per committee: 1, 1, 1, 1, 1, 1, 1, 1, 1, 1