I am trying to predict ages using linear regression in R. Basically I am using gene expression data to predict the ages, so the columns you see here are genes.
Here is a small subset of the data (the original data not the train, called age_pred):
structure(list(age = c(47, 39, 37, 8, 42, 45, 49, 43, 39, 48),
HNRNPA0 = c(29.73446, 29.92989, 31.95408, 32.08738, 30.9989,
31.73896, 30.79453, 31.47219, 31.81943, 30.88048), ABHD2 = c(32.9946265323029,
32.7362770559135, 34.331705505806, 33.7107749955508, 33.4347574459267,
34.5282535270287, 33.8085246495487, 33.4646375518867, 33.4936237157377,
32.3604653643843), CYB5R3 = c(35.58433, 35.56673, 37.35725,
35.05798, 35.36807, 36.20249, 34.61598, 36.41034, 37.95884,
35.03965), RPRD2 = c(32.80401, 34.05659, 34.20036, 33.90712,
33.21673, 33.75369, 33.64168, 34.37718, 32.62894, 32.84124
), GRINA = c(35.02339, 34.49548, 35.43786, 35.73121, 34.2059,
34.6569, 33.86705, 35.63485, 34.88564, 34.44139), SEC61A1 = c(34.32433,
35.17745, 35.93087, 35.91407, 35.04778, 34.98187, 34.6524,
36.05048, 35.16417, 33.89892), HSPA5 = c(32.983, 33.15406,
35.41871, 35.88919, 34.10364, 34.23049, 33.81859, 35.34636,
34.51912, 33.10022), ARF3 = c(33.7404667070002, 32.4284787643714,
34.9797780950407, 35.5112520700914, 33.5425535496703, 34.5253494533377,
33.8143672021478, 34.1443535341306, 34.8727981424934, 33.7736424939363
), LAMC1 = c(33.58156, 34.4972, 36.386, 35.24869, 35.20215,
35.89395, 35.654, 36.31492, 34.99312, 35.20289)), row.names = c("EA595454",
"EA595500", "EA595522", "EA595529", "EA595597", "EA595624", "EA595632",
"EA595635", "EA595647", "EA595654"), class = "data.frame")
Code:
DEX = createDataPartition(y = age_pred$age, p=0.8, list = FALSE)
age_trn = age_pred[DEX, ]
age_tst = age_pred[-DEX,]
ctrlCV = trainControl(method = 'cv', number = 5 , classProbs = FALSE , savePredictions = TRUE, summaryFunction = twoClassSummary )
ageModel <- caret::train(age ~ ., data = age_trn,
method = 'lm',
trControl = ctrlCV)
And the error:
Error in sensitivity.default(data[, "pred"], data[, "obs"], lev[1]) :
inputs must be factors
using the glimpse(age_pred)
function, all features in the data are type dbl. Here are some of them:
$ age <dbl> 61, 59, 30, 64, 67, 71, 65, 61, 70, 48, 64, 77, 73, 40, 58, 62, 79, 53, 60, 68, 71, 52, 54, 50, 70, 53, 67, 67, 71, 72, 54,…
$ HNRNPA0 <dbl> 29.92989, 31.95408, 32.08738, 30.99890, 31.73896, 30.79453, 31.47219, 31.81943, 30.88048, 31.83250, 32.70315, 32.06897, 30.…
$ ABHD2 <dbl> 32.73628, 34.33171, 33.71077, 33.43476, 34.52825, 33.80852, 33.46464, 33.49362, 32.36047, 34.25793, 34.30586, 33.86784, 32.…
$ CYB5R3 <dbl> 35.56673, 37.35725, 35.05798, 35.36807, 36.20249, 34.61598, 36.41034, 37.95884, 35.03965, 36.54919, 36.39444, 34.95226, 35.…
$ RPRD2 <dbl> 34.05659, 34.20036, 33.90712, 33.21673, 33.75369, 33.64168, 34.37718, 32.62894, 32.84124, 33.39123, 34.20990, 33.00906, 32.…
$ GRINA <dbl> 34.49548, 35.43786, 35.73121, 34.20590, 34.65690, 33.86705, 35.63485, 34.88564, 34.44139, 35.44804, 35.09964, 34.30946, 34.…
$ SEC61A1 <dbl> 35.17745, 35.93087, 35.91407, 35.04778, 34.98187, 34.65240, 36.05048, 35.16417, 33.89892, 35.25823, 34.81930, 34.82199, 34.…
$ HSPA5 <dbl> 33.15406, 35.41871, 35.88919, 34.10364, 34.23049, 33.81859, 35.34636, 34.51912, 33.10022, 34.17081, 35.64166, 34.21163, 33.…
$ ARF3 <dbl> 32.42848, 34.97978, 35.51125, 33.54255, 34.52535, 33.81437, 34.14435, 34.87280, 33.77364, 34.44382, 34.84120, 33.96720, 32.…
$ LAMC1 <dbl> 34.49720, 36.38600, 35.24869, 35.20215, 35.89395, 35.65400, 36.31492, 34.99312, 35.20289, 35.34522, 35.51326, 35.87105, 35.…
$ MBD3 <dbl> 27.91208, 29.42368, 27.11015, 28.48502, 29.10552, 29.30748, 27.87883, 30.76615, 26.77972, 29.42166, 27.70776, 32.48756, 34.…
I don't understand, why it wants inputs to be factors, it doesn't make sense, linear regression needs numeric values!
What is causing this error? is my code faulty anywhere?
The problem was with the cross-validation parameters:
ctrlCV = trainControl(method = 'cv', number = 5 , classProbs = FALSE , savePredictions = TRUE, summaryFunction = twoClassSummary )
summaryFunction = twoClassSummary
did this error.. in case anyone faces such a problem in the future.