Search code examples
rnantraining-data

Training the datasets in R studio


I am dividing the datasets into 70% of the training and 30% as validating set. There are lots of NaN variables too and maybe because of that, I could not train my data. Although, I was able to differentiate the datasets into training and test datasets. but when I want to train, I am getting this error ("Error in na.fail.default(list(ndvi = c(0.426755102040816, 0.409, 0.501735849056604, : missing values in object").

I want to estimate biomass using NDVI and then see the relationship with the observed biomass.

set.seed(123)
inTrain = createDataPartition(newdata$ndvi, p = 0.7, list = FALSE)
training = newdata[ inTrain,]
testing = newdata[-inTrain,]
cols <- c("ndvi", "first", "second", "third","DMY_kg_ha")
newdata[cols] <- lapply(newdata[cols], factor)  ## as.factor() could also be used
set.seed(32343)
modelFit<-train(DMY_kg_ha~first+second+third+treatment, data=training, method='glm',na.rm = na.omit)
modelFit

After creating modelfit, I want to use 'vif' in R to find out which variables are important.


Solution

  • Try this

    # load library
    library(caret)
    
    # set seed value
    set.seed(123)
    
    # remove NA's in data
    newdata = na.omit(newdata)
    
    # split data set
    inTrain = createDataPartition(newdata$ndvi, p = 0.7, list = FALSE)
    training = newdata[ inTrain,]
    testing = newdata[-inTrain,]
    
    # convert columns to factors
    cols <- c("ndvi", "first", "second", "third","DMY_kg_ha")
    newdata[cols] <- lapply(newdata[cols], factor)  ## as.factor() could also be used
    
    # reset seed value
    set.seed(32343)
    
    # train model
    modelFit<-train(DMY_kg_ha~first+second+third+treatment, data=training, method='glm',na.rm = na.omit)
    
    # view model
    modelFit