I have the following R script set up that is designed to build a model from a data frame using the caret package:
library(caret)
library(broom)
data<- data.table("mydata.csv")
splitprob <- 0.8
traintestindex <- createDataPartition(data$fluorescence, p=splitprob, list=F)
testset <- data[-traintestindex,]
trainingset <- data[traintestindex,]
model <- train(fluorescence~., trainingset, method = "glmStepAIC", preProc = c("center","scale"), trControl = cvCtrl)
final_model<- tidy(model$finalModel)
write.csv(tidy, "model_glm.csv")
I would like to be able to have the functionality of this code be expressed within a Python script. After a pandas data frame is generated, it will then be converted into an R data frame and subsequently run through the train function of caret that is set to the same parameters as in the R script above.
import pandas as pd
from rpy2.robjects import r
import sys
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
from rpy2.robjects import r, pandas2ri
pandas2ri.activate()
caret = rpackages.importr('caret')
broom= rpackages.importr('broom')
my_data= pd.read_csv("my_data.csv")
r_dataframe= pandas2ri.py2ri(my_data)
preprocessing= ["center", "scale"]
center_scale= StrVector(preprocessing)
cvCtrl = caret.trainControl(method = "repeatedcv", number= 20, repeats = 100)
model_R= caret.train("fluorescence~.", data= r_dataframe, method = "glmStepAIC", preProc = center_scale, trControl = cvCtrl)
print(model_R.finalModel)
However, this script is evidently not properly configured, as my attempts to run the Python script with rpy2 yields SyntaxError: invalid syntax
at the line model_R= caret.train("fluorescence~., r_dataframe, method = "glmStepAIC", preProc = center_scale, trControl = cvCtrl")
. I have tried to follow the syntax given in the documentation (source: https://rpy2.github.io/doc/latest/html/introduction.html?highlight=linear%20model), but the way in which one would set up code such as this is sparse.
What must be fixed in my Python code in order to get the code to work so I can build a model from my data frame?
I figured out the format for implementing the caret functions via rpy2:
import pandas as pd
from rpy2.robjects import r
import sys
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
from rpy2.robjects import r, pandas2ri
pandas2ri.activate()
caret = rpackages.importr('caret')
broom= rpackages.importr('broom')
my_data= pd.read_csv("my_data.csv")
r_dataframe= pandas2ri.py2ri(my_data)
preprocessing= ["center", "scale"]
center_scale= StrVector(preprocessing)
#these are the columns in my data frame that will consist of my predictors in the model
predictors= ['predictor1','predictor2','predictor3']
predictors_vector= StrVector(predictors)
#this column from the dataframe consists of the outcome of the model
outcome= ['fluorescence']
outcome_vector= StrVector(outcome)
#this line extracts the columns of the predictors from the dataframe
columns_predictors= r_dataframe.rx(True, columns_vector)
#this line extracts the column of the outcome from the dataframe
column_response= r_dataframe.rx(True, column_response)
cvCtrl = caret.trainControl(method = "repeatedcv", number= 20, repeats = 100)
model_R= caret.train(columns_predictors, columns_response, method = "glmStepAIC", preProc = center_scale, trControl = cvCtrl)
print(model_R.rx('finalModel'))