I have a logistic regression model. I would like to predict the morphology of items in multiple dataframes that have been put into a list.
I have lots of dataframes (most say working with a list of dataframes is better).
I need help with 1:
I am not sure whether it is better to have the 1000 dataframes separately and predict using loops etc, or to continue having them inside a list.
Prior to this code I have split my data into train and test sets. I then trained the model using:
library(nnet)
#Training the multinomial model
multinom_model <- multinom(Morphology ~ ., data=morph, maxit=500)
#Checking the model
summary(multinom_model)
This was then followed by validation etc.
My new dataset, consisting of multiple dataframes stored in a list, called rose.list was formatted by the following:
filesrose <- list.files(pattern = "_rose.csv")
#Rename all files of rose dataset 'rose.i'
for (i in seq_along(filesrose)) {
assign(paste("rose", i, sep = "."), read.csv(filesrose[i]))
}
#Make a list of the dataframes
rose.list <- lapply(ls(pattern="rose."), function(x) get(x))
I have been using this function to predict on a singular new dataframe
# Predicting the classification for individual datasets
rose.1$Morph <- predict(multinom_model, newdata=rose.1, "class")
Which gives me the dataframe, with the new prediction column 'Morph'
But how would I do this for multiple dataframes in my rose.list? I have tried:
lapply(rose.list, predict(multinom_model, "class"))
Error in eval(predvars, data, env) : object 'Area' not found
and, but also has the error:
lapply(rose.list, predict(multinom_model, newdata = rose.list, "class"))
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows:
You can use an anonymous function (those with function(x)
or abbreviated \(x)
).
library(nnet)
multinom_model <- multinom(low ~ ., birthwt)
lapply(df_list, \(x) predict(multinom_model, newdata=x, type='class'))
# $rose_1
# [1] 1 0 1 1 0 0 0 1 0 1 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 0 1 0 1 0
# [40] 1 0 0 0 0 0 1 1 1 0 1 1 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 1 1 1 1 1 0 0 1
# [79] 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 0
# [118] 1 0 0 1 1 0 1 0 0 0 1 1 0 1 1 1 0 1 0 1 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 1
# [157] 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 1 0 0 0 0 1 0 1 1 1 1 0 0 1
# Levels: 0 1
#
# $rose_2
# [1] 0 1 0 1 1 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 1 0 1 1 1 1 0 0 1 0 0 1 0 1 1 0 1
# [40] 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 1 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1
# [79] 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0
# [118] 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 1 1 0 0 0 1 0 0 1 0 0 0 1 0
# [157] 0 0 0 1 1 1 1 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0
# Levels: 0 1
#
# $rose_3
# [1] 0 0 0 0 1 1 0 1 1 0 0 1 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1
# [40] 0 0 0 1 1 0 0 0 1 1 0 0 0 1 0 1 1 1 1 0 0 0 1 0 1 0 1 1 0 1 0 0 1 0 0 0 0 1 1
# [79] 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1
# [118] 0 0 0 0 1 0 1 0 1 1 1 1 0 0 0 1 0 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0
# [157] 0 1 0 0 1 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0
# Levels: 0 1
To add the predictions as new column to each data frame in the list, modify the code like so:
res <- lapply(df_list, \(x) cbind(x, pred=predict(multinom_model, newdata=x, type="class")))
lapply(res, head)
# $rose_1
# low age lwt race smoke ptl ht ui ftv bwt pred
# 136 0 24 115 1 0 0 0 0 2 3090 0
# 154 0 26 133 3 1 2 0 0 0 3260 0
# 34 1 19 112 1 1 0 0 1 0 2084 1
# 166 0 16 112 2 0 0 0 0 0 3374 0
# 27 1 20 150 1 1 0 0 0 2 1928 1
# 218 0 26 160 3 0 0 0 0 0 4054 0
#
# $rose_2
# low age lwt race smoke ptl ht ui ftv bwt pred
# 167 0 16 135 1 1 0 0 0 0 3374 0
# 26 1 25 92 1 1 0 0 0 0 1928 1
# 149 0 23 119 3 0 0 0 0 2 3232 0
# 98 0 22 95 3 0 0 1 0 0 2751 0
# 222 0 31 120 1 0 0 0 0 2 4167 0
# 220 0 22 129 1 0 0 0 0 0 4111 0
#
# $rose_3
# low age lwt race smoke ptl ht ui ftv bwt pred
# 183 0 36 175 1 0 0 0 0 0 3600 0
# 86 0 33 155 3 0 0 0 0 3 2551 0
# 51 1 20 121 1 1 1 0 1 0 2296 1
# 17 1 23 97 3 0 0 0 1 1 1588 1
# 78 1 14 101 3 1 1 0 0 0 2466 1
# 167 0 16 135 1 1 0 0 0 0 3374 0
Data:
data('birthwt', package='MASS')
set.seed(42)
df_list <- replicate(3, birthwt[sample(nrow(birthwt), replace=TRUE), ], simplify=FALSE) |>
setNames(paste0('rose_', 1:3))