dataset being used is in this google sheets link https://docs.google.com/spreadsheets/d/1eV33Sgx_UVtk2vDtNBc4Yqs_kQoeffY0oj5gSCq9rCs/edit?usp=sharing
AMC.dataset$ExamMC.A<-surveySP15$Exams_A
AMC.dataset$ExamMC.A<-factor(NA, levels=c("TRUE", "FALSE"))
AMC.dataset$ExamMC.A[AMC.dataset$Exams_A=="1 time"|AMC.dataset$Exams_A=="2-4 times"|AMC.dataset$Exams_A==">4 times"]<-"TRUE"
AMC.dataset$ExamMC.A[AMC.dataset$Exams_A=="0 times"]<-"FALSE"
AMC.dataset$ExamMC.A=as.logical(AMC.dataset$ExamMC.A)
I use these 5 lines of code to re-code all 9 of the Exams_A through Exams_I variables into logical binary outcomes of "True" for those who have answered 1 or more times for any of these 9 variables. I would like to combine all of these variables into a new column in the dataset in which for each observation row, if there is even one case that is "true" for any of the 9 exams_A through I in that entire row, the new variable outcome will read as "true" meaning they have at least once committed any of te 9 types of Exam academic misconduct recorded in the dataset. if there are no true outcomes in the observation row i would like the new variable outcome to read as "false" meaning that they (the observation row) has never committed Exam academic misconduct
what I have for this new variable's code is
AMC.dataset$ExamMC = any(AMC.dataset$ExamMC.A, AMC.dataset$ExamMC.B, AMC.dataset$ExamMC.C, AMC.dataset$ExamMC.D, AMC.dataset$ExamMC.E, AMC.dataset$ExamMC.F, AMC.dataset$ExamMC.G, AMC.dataset$ExamMC.H, AMC.dataset$ExamMC.I)
however this code has been overridden by the last variable output in the string (AMC.dataset$ExamMC.I), which has 215 False cases and 0 true, will override the rest of the string to give the new variables output of 215 "false" cases even though other variables may hold "True" as their case output.
EDIT
I have now created a data frame for the set of exam misconduct variables
AMC.dataset$ExamMCdf<-data.frame(AMC.dataset$ExamMC.A, AMC.dataset$ExamMC.B, AMC.dataset$ExamMC.C, AMC.dataset$ExamMC.D, AMC.dataset$ExamMC.E, AMC.dataset$ExamMC.F, AMC.dataset$ExamMC.G, AMC.dataset$ExamMC.H, AMC.dataset$ExamMC.I)
now my question is how to go about creating a composite variable in a new column that correctly reads through each observation row, labeling any row that has even a single "true" outcome in the data frame as "true" for the composite variable. Any observation row with no "true" outcomes should be labeled as "false" by the composite variable.
Thanks for all of your help.
To make a composite row that checks for any TRUE values in the other data frame columns, use the any()
function wrapped in apply()
to go row by row. I think you can apply it to your situation:
#Makes a dataframe with TRUE/FALSE values and a low chance for TRUE
set.seed(123)
data <- data.frame(
Exams_A = sample(c(TRUE,FALSE), 10, TRUE, c(.1, .9)),
Exams_B = sample(c(TRUE,FALSE), 10, TRUE, c(.1, .9)),
Exams_C = sample(c(TRUE,FALSE), 10, TRUE, c(.1, .9)),
Exams_D = sample(c(TRUE,FALSE), 10, TRUE, c(.1, .9)),
Exams_E = rep(TRUE,10) # Inserts row of all TRUE's to show that you can limit scope
)
data$ExamMC <- apply(data[, 1:4], 1, function(x) any(x))
data$ExamMC <- apply(data[, 1:4], 1, any) # This is the updated version
# ^ This part sets what columns you want to search