I have a vector of strings, it is in my case that strings are logical rules. There are a lot of such rules, but I showed only three for clarity.
rules <- c("X[,1]>0.5 & X[,2]<1" , "X[,3]>0.2" , "X[,3]>0.3")
I would like to convert the rules to integer form, something like that
rules <- c("X[,1]>0.5 & X[,2]<1" , "X[,3]>0.2" , "X[,3]>0.3")
int <- rbind(c(0,0,2,5,0,1,0,0,1,0),c(1,2,0,0,0,0,0,0,0,0),c(1,1,0,0,0,0,0,0,0,0))
.
cbind.data.frame(rules,int)
rules 1 2 3 4 5 6 7 8 9 10
1 X[,1]>0.5 & X[,2]<1 0 0 2 5 0 1 0 0 1 0
2 X[,3]>0.2 1 2 0 0 0 0 0 0 0 0
3 X[,3]>0.3 1 1 0 0 0 0 0 0 0 0
There are three conditions
all int
vectors must be the same length
If the rule(string) is similar to another string, then the int
vectors should be similar too. This is necessary in order to be able to calculate the distance between strings or int
vectors.
the ability to convert string to int form, as well as back int form to string
Is such a conversion possible?
If all the rules are similar to the ones you showed, one way to do would be to generate a standard X
matrix, parse each of the rules and apply them to X. That will generate vectors of TRUE and FALSE (which are easily converted to 1 and 0) with length nrow(X)
.
For example,
set.seed(123)
X <- matrix(runif(3000, 0, 2), nrow = 1000)
rules <- c("X[,1]>0.5 & X[,2]<1" , "X[,3]>0.2" , "X[,3]>0.3")
int <- matrix(NA, nrow = length(rules), ncol = nrow(X))
for (i in seq_along(rules))
int[i,] <- as.numeric(eval(parse(text = rules[i])))
rownames(int) <- rules
dist <- matrix(NA, length(rules), length(rules),
dimnames = list(rules, rules))
for (i in seq_along(rules))
for (j in seq_along(rules))
dist[i, j] <- sqrt(sum((int[i,] - int[j,])^2))
dist
#> X[,1]>0.5 & X[,2]<1 X[,3]>0.2 X[,3]>0.3
#> X[,1]>0.5 & X[,2]<1 0.00000 24.67793 24.28992
#> X[,3]>0.2 24.67793 0.00000 7.28011
#> X[,3]>0.3 24.28992 7.28011 0.00000
Created on 2021-08-29 by the reprex package (v2.0.0)