I would like to generate a full non-duplicate (row wise and column wise) combination of strings that contain instructions as greater than and less than (possibly adding other mathematical sign).
How can I do it? Please see below example including partial solution, which however is missing the ">" and "<" sign, so basically the variable name, here in this example named as a:e
plus the sign for sub-setting in case variable is less or greater than 0.
The comb
object includes the variables including the desired sign for sub-setting.
comb <- data.frame(in1=c("a > 0","b > 0","c > 0","d > 0","e > 0"),
in2=c("a < 0","b < 0","c < 0","d < 0","e < 0"))
comb.vars <- with(comb, expand.grid(in1,in2, stringsAsFactors=F))
comb.vars <- rbind(data.frame(data.frame(Var3="y > 0"),comb.vars),
data.frame(data.frame(Var3="y < 0"),comb.vars));
comb.vars
This does not give the desired outcome since in the same line it can occur the same variable shows opposing sign, example: y > 0 a > 0 a < 0
in first line and also line 7 gives y > 0 b > 0 b < 0
dup <- apply(comb.vars, 1, function(x) length(which(duplicated(x)))>0)
remdup1 <- comb.vars[!dup, ]
onlyvars <- apply(remdup1, 2, function(x) substr(x, 1, regexpr("\\>", x)-1))
# remove row-wise duplicats
dup <- apply(onlyvars, 1, function(x) length(which(duplicated(x)))>0)
remdup2 <- onlyvars[!dup, ]
# remove among rows duplicates
uniq <- remdup1[!duplicated(apply(remdup2, 1, function(row) paste(sort(row), collapse=""))), ]
uniq
Base r
solution required only.
You can find the number of times the first character is repeated across a row and then only keep rows where the values where the value does not duplicate.
Using tidyverse
:
library(tidyverse)
comb.vars %>%
rowwise() %>%
mutate(
repvals = sum(duplicated(str_extract(c(Var1, Var2, Var3), "^\\w")))
) %>%
ungroup() %>%
filter(repvals == 0) %>%
select(-repvals)
Returns:
# A tibble: 40 × 3
Var3 Var1 Var2
<chr> <chr> <chr>
1 y > 0 b > 0 a < 0
2 y > 0 c > 0 a < 0
3 y > 0 d > 0 a < 0
4 y > 0 e > 0 a < 0
5 y > 0 a > 0 b < 0
6 y > 0 c > 0 b < 0
7 y > 0 d > 0 b < 0
8 y > 0 e > 0 b < 0
9 y > 0 a > 0 c < 0
10 y > 0 b > 0 c < 0
# ℹ 30 more rows
A base R version to do the same:
comb.vars$rep = apply(comb.vars, 1, function(x) {
sum(duplicated(sapply(regmatches(x, gregexec("^\\w", x)), function(x) x[[1]])))
})
comb.vars <- comb.vars[comb.vars$rep == 0, ]