I wrote an R function to classify rows (individuals) based on the columns output in an incidence matrix M5
for the following requirements:
M5 <- structure(c(1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1), dim = c(5L, 3L),
dimnames = list(c("1", "2", "3", "4", "5"), NULL))
Requirements:
Suppose Column 1 and Column 2 are negative (N) and Column 3 is positive (P) in M5. Then:
• Since Column 1 is negative, individuals 1 and 2 are confirmed negative (CN).
• Since Column 2 is negative, individuals 2, 3, and 4 are confirmed negative (CN).
• Since Column 3 is positive, at least one of individuals 4 or 5 is positive. However, individual 4 is already confirmed negative from Column 1, so the positive case must be individual 5 so individual 5 is confirmed positive (CP).
Suppose Column 1 is negative (N) and Column 2 and Column 3 are positive (P) in M5. Then:
• Since Column 1 is negative, individuals 1 and 2 are confirmed negative (CN).
• Since Column 2 is positive, at least one of individuals 2, 3, or 4 is positive. However, individual 2 is already confirmed negative from Column 1, so the positive case must be among individuals 3 and/or 4.
• Since Column 3 is positive, at least one of individuals 4 and 5 is positive.
• Therefore, individuals 1 and 2 are confirmed negative (CN), while individuals 3, 4, and 5 are suspected positive (SP).
library(tidyverse)
library(data.table, warn.conflicts = FALSE, quietly = TRUE)
GTA <-
function(incidence_matrix, test_results) {
incidence_dt <- as.data.table(incidence_matrix, keep.rownames = "individual")
determine_status <- function(row, results) {
if (any(row == 1 & results == "N")) {
return("Confirm Negative (CN)")
}
pos_pools <- which(row == 1 & results == "P")
neg_pools <- which(row == 1 & results == "N")
if (length(pos_pools) > 0) {
if (length(neg_pools) == 0) {
return("Confirm Positive (CP)")
} else {
return("Suspected Positive (SP)")
}
}
return("Undetermined")
}
# Apply the function to each individual
incidence_dt <-
incidence_dt %>%
rowwise() %>%
mutate(status = determine_status(c_across(starts_with("V")), test_results)) %>%
ungroup()
# Identify unique suspected positives in each pool and update their status
incidence_dt <-
incidence_dt %>%
pivot_longer(cols = starts_with("V"), names_to = "pool", values_to = "value") %>%
group_by(pool) %>%
mutate(unique_sp = if_else(value == 1 & status == "Suspected Positive (SP)",
sum(status == "Suspected Positive (SP)") == 1, FALSE)) %>%
ungroup() %>%
mutate(status = if_else(unique_sp & status == "Suspected Positive (SP)",
"Confirm Positive (CP)", status)) %>%
select(individual, status) %>%
distinct()
# Update status based on confirmed negatives (CN)
incidence_dt <-
incidence_dt %>%
rowwise() %>%
mutate(status = case_when(
status == "Confirm Positive (CP)" & any(incidence_dt$status == "Confirm Negative (CN)") ~ "Suspected Positive (SP)",
TRUE ~ status
)) %>%
ungroup()
# Convert to data.table
incidence_dt <- as.data.table(incidence_dt)
# Create columns for CN, CP, SP
status_dt <-
incidence_dt %>%
mutate(CN = if_else(status == "Confirm Negative (CN)", 1, 0),
CP = if_else(status == "Confirm Positive (CP)", 1, 0),
SP = if_else(status == "Suspected Positive (SP)", 1, 0)) %>%
select(-status)
# Return the result
return(status_dt)
}
# Example usage
M5 <- structure(c(1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1), dim = c(5L, 3L),
dimnames = list(c("1", "2", "3", "4", "5"), NULL))
GTA(
incidence_matrix = M5
, test_results = c("N", "N", "P")
)
#> individual CN CP SP
#> <char> <num> <num> <num>
#> 1: 1 1 0 0
#> 2: 2 1 0 0
#> 3: 3 1 0 0
#> 4: 4 1 0 0
#> 5: 5 0 0 1
GTA(
incidence_matrix = M5
, test_results = c("N", "P", "P")
)
#> individual CN CP SP
#> <char> <num> <num> <num>
#> 1: 1 1 0 0
#> 2: 2 1 0 0
#> 3: 3 0 0 1
#> 4: 4 0 0 1
#> 5: 5 0 0 1
Problems
GTA
in GTA(M5, test_results = c("N", "N", "P"))
classify individual 5 as SP whereas it should be CP.Edited
For GTA(M5, test_results = c("P", "N", "P"))
, the individuals 2, 3, and 4 should be classified as CN, and the individuals 1, and 5 should be classified as CP:
Edited after clarification
From my understanding of your function, you set out the following rules:
If we follow these rules to their logical conclusion, we can get our CN column by just check if each individual had at least one negative test.
Because we will essentially always discard the positive results of any positive test if an individual has had at least one negative. We will multiply each positive test column by the opposite of CN (getting rid of any positives for individuals with a negative test result).
From here we will check for each test if only one individual had a positive test without any negatives. If so, then they are confirmed positive. If not, then all those individuals are SP.
At the end we return our results. The one case where this doesn't work is all negative tests, but we can just return CN = 1 for every individual if that happens.
Lastly, we can use the gtools permutation function to get all combinations of tests and row apply through those permutations. Get the output for every permutation of N and P.
M5 <- structure(c(1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1), dim = c(5L, 3L),
dimnames = list(c("1", "2", "3", "4", "5"), NULL))
GTA <- function(incidence_matrix, test_results) {
num_ind <- nrow(incidence_matrix)
CP <- rep(0L, num_ind)
SP <- rep(0L, num_ind)
if (!("P" %in% test_results)) {
CN <- rep(1L, num_ind)
return(data.frame(individual = dimnames(incidence_matrix)[[1]], CN, CP, SP))
}
CN <- as.integer(rowSums(incidence_matrix[, test_results == "N", drop = FALSE]) != 0)
pos_tests <- incidence_matrix[, test_results == "P", drop = FALSE] * !CN
for (i in 1:ncol(pos_tests)) {
if (sum(pos_tests[, i]) == 1) {
CP[pos_tests[, i] == 1] = 1L
} else if (sum(pos_tests[, i]) > 1) {
SP[pos_tests[, i] == 1] = 1L
}
}
data.frame(individual = dimnames(incidence_matrix)[[1]], CN, CP, SP)
}
test_results <- gtools::permutations(c("P", "N"), n = 2, r = 3, repeats.allowed = TRUE)
test_results
#> [,1] [,2] [,3]
#> [1,] "N" "N" "N"
#> [2,] "N" "N" "P"
#> [3,] "N" "P" "N"
#> [4,] "N" "P" "P"
#> [5,] "P" "N" "N"
#> [6,] "P" "N" "P"
#> [7,] "P" "P" "N"
#> [8,] "P" "P" "P"
apply(test_results, 1, function(test_results) GTA(M5, test_results))
#> [[1]]
#> individual CN CP SP
#> 1 1 1 0 0
#> 2 2 1 0 0
#> 3 3 1 0 0
#> 4 4 1 0 0
#> 5 5 1 0 0
#>
#> [[2]]
#> individual CN CP SP
#> 1 1 1 0 0
#> 2 2 1 0 0
#> 3 3 1 0 0
#> 4 4 1 0 0
#> 5 5 0 1 0
#>
#> [[3]]
#> individual CN CP SP
#> 1 1 1 0 0
#> 2 2 1 0 0
#> 3 3 0 1 0
#> 4 4 1 0 0
#> 5 5 1 0 0
#>
#> [[4]]
#> individual CN CP SP
#> 1 1 1 0 0
#> 2 2 1 0 0
#> 3 3 0 0 1
#> 4 4 0 0 1
#> 5 5 0 0 1
#>
#> [[5]]
#> individual CN CP SP
#> 1 1 0 1 0
#> 2 2 1 0 0
#> 3 3 1 0 0
#> 4 4 1 0 0
#> 5 5 1 0 0
#>
#> [[6]]
#> individual CN CP SP
#> 1 1 0 1 0
#> 2 2 1 0 0
#> 3 3 1 0 0
#> 4 4 1 0 0
#> 5 5 0 1 0
#>
#> [[7]]
#> individual CN CP SP
#> 1 1 0 0 1
#> 2 2 0 0 1
#> 3 3 0 0 1
#> 4 4 1 0 0
#> 5 5 1 0 0
#>
#> [[8]]
#> individual CN CP SP
#> 1 1 0 0 1
#> 2 2 0 0 1
#> 3 3 0 0 1
#> 4 4 0 0 1
#> 5 5 0 0 1
Created on 2024-06-24 with reprex v2.1.0