I have reviewed Error: Problem with mutate()
column (...) must be size 15 or 1, not 17192, How to drop columns with column names that contain specific string?, Remove columns that contain a specific word, and associated error troubleshooting.
I have a large dataset with viral data for different species in different areas - sample data below
Country ..2 Area Site ID Species Sample Original Sample/Specimen #
<chr> <lgl> <chr> <chr> <chr> <chr> <chr> <chr>
Tanzania NA UMNP UMNPhq AATPH PG Feces AATPHF2
Tanzania NA UMNP UMNPhq AATPI PG Feces AATPIF2
Tanzania NA UMNP UMNPhq AATPJ PG Feces AATPJF2
Tanzania NA UMNP UMNPhq ATTPK PG Feces ATTPKF2
Tanzania NA UMNP UMNPhq AATPL PG Feces AATPLF2
Filovirus (MOD) PCR Date (Filo MOD)
<chr> <date>
Indeterminant 2015-03-16
Indeterminant 2015-03-16
Indeterminant 2015-03-16
Indeterminant 2015-03-16
Negative 2015-03-16
I am trying to recode a viral status, positive or negative, for every sample id (just filovirus here, but there's a lot of them, so please help code more generally)
Code I've tried - first subsetting data to only include a specific area
viral <- subset(data, Area %in% "UMNP")
Here I got rid of unwanted columns and then was able to get infection status, but it converted all other information on the sample to "NA" causing additional error codes when I try to maintain the values.
viralres <- viral %>%
dplyr::select(-matches(c('Performed by ()', 'performed by', 'Date of', '1Performed by', 'Performed by', "Date ()", "...2"),)) %>%
mutate_if(is.character, ~case_when(. == "Indeterminant" ~ "0",
. == "Negative" ~ "0",
. == "Positive" ~ "1"))
Dput
structure(list(Country = c("Tanzania", "Tanzania", "Tanzania",
"Tanzania", "Tanzania"), ...2 = c(NA, NA, NA, NA, NA), Area = c("UMNP",
"UMNP", "UMNP", "UMNP", "UMNP"), Site = c("UMNPhq", "UMNPhq",
"UMNPhq", "UMNPhq", "UMNPhq"), `Animal ID` = c("AATPH", "AATPI",
"AATPJ", "ATTPK", "AATPL"), Species = c("Procolobus gordonorum",
"Procolobus gordonorum", "Procolobus gordonorum", "Procolobus gordonorum",
"Procolobus gordonorum"), `Sample Type` = c("Feces", "Feces",
"Feces", "Feces", "Feces"), `Original Sample/Specimen #` = c("AATPHF2",
"AATPIF2", "AATPJF2", "ATTPKF2", "AATPLF2"), `Filovirus (MOD) PCR` = c("Indeterminant",
"Indeterminant", "Indeterminant", "Indeterminant", "Negative"
), `Date (Filo MOD)` = structure(c(16510, 16510, 16510, 16510,
16510), class = "Date")), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
Using mutate_if(is.character, ...)
will change all of your character columns. It looks like the only column you are trying to change is "Filovirus (MOD) PCR". So you could change the command to
viral %>%
dplyr::select(-matches(c('Performed by ()', 'performed by', 'Date of', '1Performed by', 'Performed by', "Date ()", "...2"),)) %>%
mutate(across(`Filovirus (MOD) PCR`, ~case_when(. == "Indeterminant" ~ "0",
. == "Negative" ~ "0",
. == "Positive" ~ "1")))
for the least amount of change. That way you are only changing that column. Alternatively you could more directly mutate that single column using case_match
viral %>%
dplyr::select(-matches(c('Performed by ()', 'performed by', 'Date of', '1Performed by', 'Performed by', "Date ()", "...2"),)) %>%
mutate(`Filovirus (MOD) PCR` = case_match(`Filovirus (MOD) PCR`,"Indeterminant" ~ "0",
"Negative" ~ "0",
"Positive" ~ "1"))
Note that case_match
was introduced in dplyr 1.1.0