I'm training the random forest algorithm three times and saving the variables' importance into the list ( using the caret package). how can I calculate the mean of each feature if it exists? for example, how can I calculate the mean of three overall "ESR"? ( I am going to train this algorithm a thousand times ) these are my example :
[[1]]
rf variable importance
only 20 most important variables shown (out of 119)
Overall
Albumin 100.00
age 97.36
PR 60.18
RR 42.41
Weight 35.26
SystolicBP 32.14
Cancers1 29.79
ESR 27.66
Neutrophyl 26.98
CPK 25.68
EjectionFraction 25.59
BMI 24.42
Calcium 23.87
WBC 22.36
Urea 22.01
LDH 21.23
FBS 20.21
Ddimer 19.32
HB 18.99
Lymphocyte 18.78
[[2]]
rf variable importance
only 20 most important variables shown (out of 119)
Overall
age 100.00
FBS 57.80
WBC 53.88
PR 53.84
Neutrophyl 53.52
Weight 52.31
HB 51.69
LDH 50.15
Urea 49.31
Albumin 47.05
Lymphocyte 46.87
CPK 46.54
SystolicBP 45.64
Calcium 44.87
ESR 43.54
Ferritin 43.03
CRP 43.00
PLT 42.83
Creatinine 42.53
EjectionFraction 41.43
[[3]]
rf variable importance
only 20 most important variables shown (out of 119)
Overall
age 100.00
Albumin 43.41
Weight 24.88
FBS 24.63
BS 23.31
PR 21.47
LDH 21.06
Neutrophyl 20.68
BMI 17.94
EjectionFraction 17.29
CPK 16.49
WBC 16.11
ALP 15.72
RR 15.28
Lymphocyte 14.94
Cancers1 14.68
CRP 14.50
ESR 14.38
Ddimer 13.05
Ferritin 12.96
can I create a data frame that saves the features and their overall? thanks for helping this is my code :
prediction_value_rf=list()
importance_rf=list()
auc_rf=list()
weight_rf=list()
for ( i in 1:1000){
resample_death <- death[sample(nrow(death), size=300), ]
resample_alive <-alive[sample(nrow(alive), size=300), ]
f_dataset=rbind(resample_alive,resample_death)
inx <- sample.split(seq_len(nrow(f_dataset)), 0.25)
trainData<- f_dataset[!inx, ]
testData <- f_dataset[inx, ]
rf_fit <- train(vital_status ~ .,
data = trainData,
method = "rf",
)
pred=predict(rf_fit, testData[,-109])
pred1=predict(rf_fit, testData[,-109],type='prob')
prediction_value_rf[[i]]=pred1[2]
auc=auc(testData$vital_status,as.numeric(pred1[[2]]),direction="<", levels = levels(testData$vital_status))
auc_rf[[i]]=auc
a=varImp(rf_fit,scale = TRUE)
importance_rf[[i]] <- a
weight_rf[[i]]=max(rf_fit$results$Accuracy)
}
in the end, I want to calculate the mean of all overall features (wanna create ensemble model ) . my dataset contain 109 feature and 4200 sample .
> dput(importance_rf)
list(structure(list(importance = structure(list(Overall = c(100,
32.9191368970689, 0, 29.4889011862606, 24.8664587940577, 21.8746288172869,
21.7051171149606, 20.0868919191658, 20.3678665772965, 20.2873319598582,
33.7597621482843, 42.1891066454062, 22.7027798691687, 17.0766042463516,
39.4559095867264, 17.9431725056776, 23.2881573588367, 5.04721532342669,
22.3290849893345, 20.7266835722104, 21.5723519894789, 19.5211504808207,
21.2794742178794, 20.1624361665348, 13.7420140365184, 31.7941409073075,
20.9409991203303, 30.4229311296897, 11.5187371425859, 12.8487688047673,
9.40749461290917, 10.361793419014, 32.5677389075859, 26.5411449178312,
23.3996095888034, 2.84823906954271, 10.0257295515002, 2.27406632480383,
0.221285401034356, 0.844517489791465, 1.97286969198767, 0.0909347758420391,
0.541007254389242, 0.359718315763083, 1.26912866459011, 0.158954429130366,
0.245159217854806, 1.43768928047267, 0.796627703857018, 0.0731764363395144,
1.72357935713514, 0.424562470997031, 3.38312715168264, 1.88770244332681,
0.0314985706869475, 0, 0.65427952713802, 0, 0.0171557103229226,
0.709743254593806, 1.13539938842206, 0.0367104133426984, 2.95211595985093,
0, 0.582868854914444, 0.393813676879418, 1.15732422255054, 2.24940561099934,
1.73472209382337, 1.34428847541862, 1.15486784386305, 0, 0.689216959226089,
0.625678629482648, 1.81161997423301, 0.433030827900777, 10.9106578268112,
2.24295278032112, 18.176936900799, 1.74711580562318, 1.45310012173878,
0.952143653091356, 1.16652405720194, 1.11866015943186, 2.68527336222893,
1.12853921993574, 5.10727247259446, 1.93994049536545, 1.36475795626174,
2.95717137358439, 0.115367165512589, 0, 1.45815337045876, 0,
1.78943634306828, 5.71749991297189, 2.43536004133198, 1.27231795918686,
11.4771984230702, 3.0971032186365, 0.708058471655881, 0.170261025718881,
3.37435307537382, 1.56044494248123, 1.09294450754124, 0, 2.25592933845801,
2.30276525800757, 1.86149986210819, 1.46145976307003, 1.26858067553346,
2.11041986636824, 0.0902116364175813, 1.54299863875175, 0, 0.269632340125967,
1.88548693593634, 4.47233507072462, 0.66752451890319)), class = "data.frame", row.names = c("age",
"Weight", "HookhConsumption", "BMI", "SystolicBP", "RR", "DiastolicBP",
"ALP", "ALT", "AST", "Albumin", "BS", "CPK", "CRP", "Calcium",
"Creatinine", "Ddimer", "Directbilirubin", "ESR", "FBS", "Ferritin",
"HB", "LDH", "Lymphocyte", "Mg", "Neutrophyl", "PLT", "PR", "PhosphorP",
"PotassiumK", "SodiumNA", "Totalbilirubin", "Urea", "WBC", "EjectionFraction",
"TotalLungInvolvementRank", "TotalLungInvolvementPercent", "sex2",
"Type.of.heart.disease1", "Type.of.heart.disease2", "Type.of.heart.disease9",
"Unilateral.paralysis1", "Ulcers1", "Obesity.BMI.above.351",
"Peripheral.artery.disease1", "organ.involment.from.diabetes1",
"organ.involment.from.diabetes2", "organ.involment.from.diabetes3",
"UsingDrugHistory1", "UsingAlcoholHistory1", "Transplantation1",
"SeverityofKidneyDisease1", "SeverityofKidneyDisease2", "SeverityofKidneyDisease3",
"SeverityChronicliverdisease1", "SeverityChronicliverdisease2",
"SeverityChronicliverdisease3", "SeverityChronicliverdisease4",
"SeverityChronicliverdisease9", "Schizophrenia1", "Rheumatologicaldiseases1",
"Pregnant1", "Neurologicaldiseases1", "LiverTransplantation1",
"KidneyTransplantation1", "Immunedeficiencydisease1", "Hypothyroidism1",
"Hypertention1", "Hyperlipidemia1", "Historyofsmoking1", "HistoryofHookah1",
"HeartTransplantation1", "HIV1", "FattyLiver1", "Diabetes1",
"Chronicliverdisease1", "Chronickidneydisease1", "CardiovascularDisease1",
"Cancers1", "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1",
"WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1",
"Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1",
"Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1",
"Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1",
"LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", "Headace1",
"Fever1", "Fatigue1", "EyeConjunctivitis1", "Epigastric1", "Dyspnea1",
"DryCough1", "Dizziness1", "Diarrhea1", "Chestpain1", "CardiacArrhythmia1",
"Body_Pain1", "Bleeding1", "Ataxia1", "Anorexia1", "PCRCOVID19Test1",
"PCRCOVID19Test2")), model = "rf", calledFrom = "varImp"), class = "varImp.train"),
structure(list(importance = structure(list(Overall = c(100,
36.8463357663146, 0, 20.5921448468941, 35.0980630859042,
15.7098956910968, 27.5542325637653, 22.3935810225052, 25.6062709809081,
18.9072078537409, 30.5428709528983, 26.4061314161858, 27.2933977255992,
18.3744993875278, 57.5115149169245, 14.4361277134982, 49.9265957132235,
6.10831602661626, 28.2527379885906, 23.0147565449908, 32.7997892888894,
22.7055707536584, 36.9763807158356, 28.9941599048441, 17.8186386653819,
31.2682240107287, 26.2894098494535, 41.1751827476675, 22.6316241605114,
16.9314172346857, 14.4927913128733, 13.1792980470757, 44.2836496383372,
32.7246002717468, 30.3912750391576, 10.0409713536124, 9.83444013035946,
2.50470824612248, 1.72055335723373, 1.05083165735798, 1.56193393834476,
0.233521622728958, 1.08064736921506, 0.555709266569136, 2.40106539585553,
0.291833555475466, 0.380999891346632, 2.56592221397732, 1.62107348934456,
0.504647559430998, 1.19859835755469, 0, 1.4382135880929,
1.94514657535966, 0, 0.0569205442253742, 0.44589056596685,
0.0539230755197555, 0, 0.055077983652405, 1.24527213390211,
0, 1.36267778294481, 0.151259347248717, 0.499919817645286,
0, 2.79981213016671, 2.72663427247346, 1.93725253183476,
2.70715099933653, 1.99722906280419, 0, 0.111342938271961,
1.2426657762317, 2.15186257620788, 0.584084013981451, 9.87542370836023,
3.21493418783175, 14.6556614893423, 0.67462103889104, 0.787088521176588,
2.61946726039402, 2.8099384934716, 0.377053883833586, 2.2824838493133,
1.12217532020233, 3.44210364347885, 2.61343827037804, 9.58864870521531,
1.77823199575717, 0, 0, 0.828679129518211, 0, 2.73842874693014,
14.5506870851474, 0.390367251047195, 0.811902694072225, 15.5803912323052,
4.18258978600944, 2.13546475796113, 2.66088800284236, 2.97761832225233,
3.54039994200135, 2.44519084017892, 0.737528372419208, 2.20708600548186,
4.12502178170407, 3.1835668678093, 7.61195991815971, 2.35303302862437,
5.70342032074721, 0.409606955773683, 2.4977310780031, 0.0107020031498121,
0.268000372472171, 2.32396173268619, 1.64515893404575, 0.868523484401606
)), class = "data.frame", row.names = c("age", "Weight",
"HookhConsumption", "BMI", "SystolicBP", "RR", "DiastolicBP",
"ALP", "ALT", "AST", "Albumin", "BS", "CPK", "CRP", "Calcium",
"Creatinine", "Ddimer", "Directbilirubin", "ESR", "FBS",
"Ferritin", "HB", "LDH", "Lymphocyte", "Mg", "Neutrophyl",
"PLT", "PR", "PhosphorP", "PotassiumK", "SodiumNA", "Totalbilirubin",
"Urea", "WBC", "EjectionFraction", "TotalLungInvolvementRank",
"TotalLungInvolvementPercent", "sex2", "Type.of.heart.disease1",
"Type.of.heart.disease2", "Type.of.heart.disease9", "Unilateral.paralysis1",
"Ulcers1", "Obesity.BMI.above.351", "Peripheral.artery.disease1",
"organ.involment.from.diabetes1", "organ.involment.from.diabetes2",
"organ.involment.from.diabetes3", "UsingDrugHistory1", "UsingAlcoholHistory1",
"Transplantation1", "SeverityofKidneyDisease1", "SeverityofKidneyDisease2",
"SeverityofKidneyDisease3", "SeverityChronicliverdisease1",
"SeverityChronicliverdisease2", "SeverityChronicliverdisease3",
"SeverityChronicliverdisease4", "SeverityChronicliverdisease9",
"Schizophrenia1", "Rheumatologicaldiseases1", "Pregnant1",
"Neurologicaldiseases1", "LiverTransplantation1", "KidneyTransplantation1",
"Immunedeficiencydisease1", "Hypothyroidism1", "Hypertention1",
"Hyperlipidemia1", "Historyofsmoking1", "HistoryofHookah1",
"HeartTransplantation1", "HIV1", "FattyLiver1", "Diabetes1",
"Chronicliverdisease1", "Chronickidneydisease1", "CardiovascularDisease1",
"Cancers1", "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1",
"WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1",
"Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1",
"Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1",
"Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1",
"LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", "Headace1",
"Fever1", "Fatigue1", "EyeConjunctivitis1", "Epigastric1",
"Dyspnea1", "DryCough1", "Dizziness1", "Diarrhea1", "Chestpain1",
"CardiacArrhythmia1", "Body_Pain1", "Bleeding1", "Ataxia1",
"Anorexia1", "PCRCOVID19Test1", "PCRCOVID19Test2")), model = "rf",
calledFrom = "varImp"), class = "varImp.train"), structure(list(
importance = structure(list(Overall = c(100, 36.4519408382731,
0.0121282468302786, 27.9982404793903, 19.4487163883379,
24.6079653972917, 14.1539998143239, 18.684018340339,
20.1182663550791, 17.4200861293186, 46.6309831468223,
52.2217679510578, 28.5910698857479, 16.845796014194,
31.6509235655573, 17.1000574614637, 27.8424176478161,
5.69845064904499, 21.3838903337718, 20.217605303817,
19.8702958841878, 22.3737582989512, 33.0788664305301,
20.6035947546629, 16.3220426343042, 23.4809287675538,
23.1749036748423, 57.122094059206, 12.2409421568247,
11.234114301956, 15.7946508155502, 8.80563729211453,
20.2205078755919, 20.3091908316546, 27.7497357152039,
3.8622908315769, 12.8894291926347, 5.96701805516155,
0.761922263853243, 1.41991036581607, 1.54560737492769,
0.825161722105208, 0.0172016746252156, 0.693982409239905,
0, 0.358366468201754, 1.74812586771487, 2.2746344067366,
0.745595100629448, 0.465199425668223, 0.408092232849501,
0.115358703965213, 0.0358338604150282, 2.88640197248697,
0, 0.288302498762889, 0.332551323637155, 0.0121282468302786,
0, 1.03515126482736, 1.1213600137207, 0.329413397366096,
2.0612368962315, 0, 0.610994615626186, 1.0215655608971,
3.90651448858199, 1.73374217783332, 1.47244358073369,
2.20534241559288, 0.173681720638885, 0, 0.631950099628902,
0.132328128708788, 2.92435478031454, 1.03537122788376,
4.74067414123091, 1.77981701502525, 13.1150432121738,
0.720556880972878, 1.20366662244445, 1.19169376389038,
1.86442992849398, 0.518200723424615, 2.278501378269,
1.23638371282217, 3.66947066761794, 2.03933409738165,
1.25289331603719, 1.01627904400807, 0.0324453169731015,
0, 2.29817177168672, 0, 1.53194610140319, 7.15322639329996,
0.759542631415349, 1.53353473284619, 4.77390474517756,
1.05656481042379, 0.699450154375729, 1.16224285818854,
3.65223350861514, 1.93274707207956, 1.57589588221639,
0.449432695377871, 1.36863730886437, 2.11275137384133,
3.29450357362525, 1.08676677214028, 2.18565092410049,
1.15456248328987, 0.492245547306216, 1.59592156033113,
0.0129367966189638, 0.514499765305734, 1.58591810753971,
1.84832826238423, 0.807564130566264)), class = "data.frame", row.names = c("age",
"Weight", "HookhConsumption", "BMI", "SystolicBP", "RR",
"DiastolicBP", "ALP", "ALT", "AST", "Albumin", "BS",
"CPK", "CRP", "Calcium", "Creatinine", "Ddimer", "Directbilirubin",
"ESR", "FBS", "Ferritin", "HB", "LDH", "Lymphocyte",
"Mg", "Neutrophyl", "PLT", "PR", "PhosphorP", "PotassiumK",
"SodiumNA", "Totalbilirubin", "Urea", "WBC", "EjectionFraction",
"TotalLungInvolvementRank", "TotalLungInvolvementPercent",
"sex2", "Type.of.heart.disease1", "Type.of.heart.disease2",
"Type.of.heart.disease9", "Unilateral.paralysis1", "Ulcers1",
"Obesity.BMI.above.351", "Peripheral.artery.disease1",
"organ.involment.from.diabetes1", "organ.involment.from.diabetes2",
"organ.involment.from.diabetes3", "UsingDrugHistory1",
"UsingAlcoholHistory1", "Transplantation1", "SeverityofKidneyDisease1",
"SeverityofKidneyDisease2", "SeverityofKidneyDisease3",
"SeverityChronicliverdisease1", "SeverityChronicliverdisease2",
"SeverityChronicliverdisease3", "SeverityChronicliverdisease4",
"SeverityChronicliverdisease9", "Schizophrenia1", "Rheumatologicaldiseases1",
"Pregnant1", "Neurologicaldiseases1", "LiverTransplantation1",
"KidneyTransplantation1", "Immunedeficiencydisease1",
"Hypothyroidism1", "Hypertention1", "Hyperlipidemia1",
"Historyofsmoking1", "HistoryofHookah1", "HeartTransplantation1",
"HIV1", "FattyLiver1", "Diabetes1", "Chronicliverdisease1",
"Chronickidneydisease1", "CardiovascularDisease1", "Cancers1",
"CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1",
"WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1",
"Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1",
"Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1",
"Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1",
"LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1",
"Headace1", "Fever1", "Fatigue1", "EyeConjunctivitis1",
"Epigastric1", "Dyspnea1", "DryCough1", "Dizziness1",
"Diarrhea1", "Chestpain1", "CardiacArrhythmia1", "Body_Pain1",
"Bleeding1", "Ataxia1", "Anorexia1", "PCRCOVID19Test1",
"PCRCOVID19Test2")), model = "rf", calledFrom = "varImp"), class = "varImp.train"))
For this part:
how can I calculate the mean of each feature if it exists? for example, how can I calculate the mean of three overall "ESR"?
Because you have already generated the list, you can create a function that selects the row that contains the feature name, and then apply this function to each element of the list, and then flatten it, and then calculate the mean. In case in some element the feature doesn't exist, it can be excluded from mean calculation by using na.rm
.
For example, this resembles your list:
mylist <- list(structure(list(Overall = c(100, 97.36, 60.18, 42.41, 35.26,
32.14, 29.79, 27.66, 26.98, 25.68, 25.59, 24.42, 23.87, 22.36,
22.01, 21.23, 20.21, 19.32, 18.99, 18.78)), class = "data.frame", row.names = c("Albumin",
"age", "PR", "RR", "Weight", "SystolicBP", "Cancers1", "ESR",
"Neutrophyl", "CPK", "EjectionFraction", "BMI", "Calcium", "WBC",
"Urea", "LDH", "FBS", "Ddimer", "HB", "Lymphocyte")), structure(list(
Overall = c(100, 57.8, 53.88, 53.84, 53.52, 52.31, 51.69,
50.15, 49.31, 47.05, 46.87, 46.54, 45.64, 44.87, 43.54, 43.03,
43, 42.83, 42.53, 41.43)), class = "data.frame", row.names = c("age",
"FBS", "WBC", "PR", "Neutrophyl", "Weight", "HB", "LDH", "Urea",
"Albumin", "Lymphocyte", "CPK", "SystolicBP", "Calcium", "ESR",
"Ferritin", "CRP", "PLT", "Creatinine", "EjectionFraction")),
structure(list(Overall = c(100, 43.41, 24.88, 24.63, 23.31,
21.47, 21.06, 20.68, 17.94, 17.29, 16.49, 16.11, 15.72, 15.28,
14.94, 14.68, 14.5, 14.38, 13.05, 12.96)), class = "data.frame", row.names = c("age",
"Albumin", "Weight", "FBS", "BS", "PR", "LDH", "Neutrophyl",
"BMI", "EjectionFraction", "CPK", "WBC", "ALP", "RR", "Lymphocyte",
"Cancers1", "CRP", "ESR", "Ddimer", "Ferritin")))
Here is how to calculate the mean of ESR
, which exists in all elements and CRP
which does not exist in one of the elements:
mylist |> lapply(function(dat) dat["ESR", "Overall"]) |> unlist() |> mean(na.rm = TRUE)
#[1] 28.52667
mylist |> lapply(function(dat) dat["CRP", "Overall"]) |> unlist() |> mean(na.rm = TRUE)
#[1] 28.75
Because you have many features, you can create another function to apply this step to each feature. For example:
features <- c("ESR", "CRP", "CPK", "WBC", "LDH")
feature_mean <- function(feature_name){
out <- lapply(mylist, function(dat) dat[feature_name, "Overall"])|>
unlist() |> mean(na.rm = TRUE) |>
setNames(paste0("mean_",feature_name))
return(out)
}
features |> lapply(feature_mean) |> unlist()
#mean_ESR mean_CRP mean_CPK mean_WBC mean_LDH
#28.52667 28.75000 29.57000 30.78333 30.81333
EDIT
The synthetic data used in the previous example, mylist
, contains only one "Overall" data frame object in each of its elements, so that the extraction of the feature can be applied directly to the data using lapply
. However, the actual data that you provided in the updated question, importance_rf
has more than one objects in each of its element, with the "Overall" data frame object is in the first element. The difference is the cause of the error you showed in the comment. To apply the extraction, the "Overall" data frames should be extracted first, using lapply(function(list) list[[1]])
and then the previous steps can be applied.
# Extract mean ESR
importance_rf |>
lapply(function(list) list[[1]]) |>
lapply(function(dat) dat["ESR", "Overall"]) |>
unlist() |>
mean(na.rm = TRUE)
#[1] 23.98857
# Extract mean CRP
importance_rf |>
lapply(function(list) list[[1]]) |>
lapply(function(dat) dat["CRP", "Overall"]) |>
unlist() |>
mean(na.rm = TRUE)
#[1] 17.4323
A {base R} way
The previous steps can be applied to a vector of features as follows:
features <- c("ESR", "CRP", "CPK", "WBC", "LDH")
feature_mean <- function(feature_name){
out <- importance_rf |>
lapply(function(list) list[[1]]) |>
lapply(function(dat) dat[feature_name, "Overall"])|>
unlist() |> mean(na.rm = TRUE) |>
setNames(paste0("mean_",feature_name))
return(out)
}
# Extract the mean values
features |> lapply(feature_mean) |> unlist()
#mean_ESR mean_CRP mean_CPK mean_WBC mean_LDH
#23.98857 17.43230 26.19575 26.52498 30.44491
A brief explanation about the code:
lapply(function(list) list[[1]])
extract the first element of each element in important_rf
list, which is the data frame that contains the features data.dat[feature_name, "Overall"]
extract the value of a targeted feature, feature_name
, in each extracted data frame. Only one feature is extracted from each data frame in every step.unlist()
converts the data structure of the extracted features, from a list to a numeric vector.setNames
create names for the numeric vector to make easy to identify the features of which the means are being calculated.The functions used in this way all belong to base R
category.
You don't need to install any external package to get them.
Another option is to use combinations of base R functions with other functions from purrr
package.
A {purrr}
way
library(purrr)
importance_rf |>
map(pluck(1,1)) |>
map(function(dat) set_names(dat[features,], features)) |>
as.data.frame() |>
rowMeans() |>
set_names(paste0("mean_", features))
#mean_ESR mean_CRP mean_CPK mean_WBC mean_LDH
#23.98857 17.43230 26.19575 26.52498 30.44491
These steps are much shorter than the ones in base R above, but what is done in each step might be less obvious.
Note that map
is similar with lapply
and pluck(x,1,1)
is equivalent with x[[1]][[1]]
.
A brief explanation about the code:
map(pluck(1,1))
extract the data frames, similar work with lapply(function(list) list[[1]])
above.map(function(dat) set_names(dat[features,], features))
extracts the list of features, similar with dat[feature_name, "Overall"]
above.There is a difference:
In base R way above, every feature is extracted from all data frames, and then the mean is calculated, and then another feature is extracted the same way.
In this purrr way, all the targeted features are extracted from each data frame in the list, and then the features are combined to become a new data frame by using as.data.frame
so that each row represents a feature. Then, rowMeans
is used to calculate the mean values of all values of the features.
Note that you can check the result of each step before |>
pipe. For example, importance_rf
will show all objects in each element.
importance_rf |> map(pluck(1,1))
will show only the data frame objects.
Here is a simple example of how to calculate weighted means of each feature in your list. Suppose you have this list:
some.list <- list(L1 = c(a = 2, b = 4, c = 7),
L2 = c(a = 5, b = 5, c = 2),
L3 = c(a = 3, b = 3, c = 6))
some.list
$L1
a b c
2 4 7
$L2
a b c
5 5 2
$L3
a b c
3 3 6
And suppose you have the following weight values for L1, L2, and L3 in the list:
weight <- c(w.L1 = 0.5, w.L2=0.6, w.L3 = 0.9)
weight
w.L1 w.L2 w.L3
0.5 0.6 0.9
To calculate the weighted means of a, for example, you need this calculation:
You can get this by multiplying each value of a in the list with the respected normalized weight. In this case, the normalized weight for w1 is w1/(w1+w2+w3)
.
To do these steps in R:
norm.weight <- weight/sum(weight)
norm.weight
w.L1 w.L2 w.L3
0.25 0.30 0.45
# weighted means of a,b, and c
some.list |> map2(norm.weight, `*`) |> as.data.frame() |> rowSums()
a b c
3.35 3.85 5.05
Applying these mock weight
values to your importance_rf
list and the features
in the example , we get:
importance_rf |>
map(pluck(1,1)) |>
map(function(dat) set_names(dat[features,], features)) |>
map2(norm.weight, `*`) |>
as.data.frame() |>
rowSums()
ESR CRP CPK WBC LDH
23.68084 17.36211 26.72970 25.59180 31.29827