Search code examples
rmachine-learningpcar-caret

Pass PCA preprocessing arguments to train()


I'm trying to build a predictive model in caret using PCA as pre-processing. The pre-processing would be as follows:

preProc <- preProcess(IL_train[,-1], method="pca", thresh = 0.8)

Is it possible to pass the thresh argument directly to caret's train() function? I've tried the following, but it doesn't work:

modelFit_pp <- train(IL_train$diagnosis ~ . , preProcess="pca",
                            thresh= 0.8, method="glm", data=IL_train)

If not, how can I pass the separate preProc results to the train() function?


Solution

  • As per the documentation, you specify additional preprocessing arguments with trainControl

    ?trainControl
    
    ...
    preProcOptions  
    
    A list of options to pass to preProcess. The type of pre-processing 
    (e.g. center, scaling etc) is passed in via the preProc option in train.
    ...
    

    Since your dataset is not reproducible, let's look at an example. I will use the Sonar dataset from mlbench and use the pls algorithm just for fun.

    library(caret)
    library(mlbench)
    
    data(Sonar)
    
    ctrl <- trainControl(preProcOptions = list(thresh = 0.95))
    
    mod <- train(Class ~ ., 
                 data = Sonar, 
                  method = "pls",
                  trControl = ctrl)
    

    Although documentation isn't the most exciting read, definitely make sure to try to go through it. Package authors work hard to create documentation and there are many wonders to be found within.