I have the following example of something that i've been doing, which is formally simple, but I wanted to check what are the potential alternatives to my code -- in order to get faster, if possible. Here it is the example:
Time1=Sys.time()
v=rep(c("A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P"),
each=1000)
m=matrix(0,ncol=length(v),nrow=length(v))
for (i in 1:length(v)) {
for (j in 1:length(v)) {
if (i == j) {
m[i, j] = 1
}
}
}
Time2=Sys.time()
Time2-Time1
# Time difference of 1.405404 mins
I am creating a simple relational matrix -- where the vector v1
could be interpreted as being placed as lines and columns and the matrix maps where the results are equal. If they're equal, we get m[j,i]=1
; if not equal, m[j,i]=0
. As I stated, I would like to make this code go faster. I was trying think of a way to code it as an apply
function, but I haven't figured that out for now. Still, I would like to know if there are other options besides what I've said.
EDIT: I made some corrections on the text and I tried to clarify the question.
may be this:
Instead of checking for equality of all possible values, we can do it on the vector and replicate the matrix from it to 1000 times by row and by column. It will give the same result. The column and row order is not maintained by this code. But, using the row names and column names, we can verify the answer to be correct or not.
I used t()
, because column bind is faster than row bind.
system.time({
v1 <- c("A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P")
m1 <- sapply(v1, function(x) as.integer(v1 == x))
rownames(m1) <- colnames(m1)
m1 <- do.call('cbind', mget(rep('m1', 1000)))
m1 <- t(m1)
m1 <- do.call('cbind', mget(rep('m1', 1000)))
m1 <- t(m1)
})
# user system elapsed
# 9.32 0.50 9.84
dim(m1)
# [1] 16000 16000
Another method:
This one (hard-coded) will not do any comparison, but we create values based on what can happen by comparing vector with its own values.
It can be improved by using eval(parse(text=paste()))
construct.
system.time({
m4 <- matrix(data =
c(
c(rep(1, 1000), rep(0, 15000)),
c(rep(0, 1000), rep(1, 1000), rep(0, 14000)),
c(rep(0, 2000), rep(1, 1000), rep(0, 13000)),
c(rep(0, 3000), rep(1, 1000), rep(0, 12000)),
c(rep(0, 4000), rep(1, 1000), rep(0, 11000)),
c(rep(0, 5000), rep(1, 1000), rep(0, 10000)),
c(rep(0, 6000), rep(1, 1000), rep(0, 9000)),
c(rep(0, 7000), rep(1, 1000), rep(0, 8000)),
c(rep(0, 8000), rep(1, 1000), rep(0, 7000)),
c(rep(0, 9000), rep(1, 1000), rep(0, 6000)),
c(rep(0, 10000), rep(1, 1000), rep(0, 5000)),
c(rep(0, 11000), rep(1, 1000), rep(0, 4000)),
c(rep(0, 12000), rep(1, 1000), rep(0, 3000)),
c(rep(0, 13000), rep(1, 1000), rep(0, 2000)),
c(rep(0, 14000), rep(1, 1000), rep(0, 1000)),
c(rep(0, 15000), rep(1, 1000))), nrow = 16000, ncol = 16000)
})
# user system elapsed
# 0.72 0.93 1.82
Note: As @r2evans said, this will not work if the OP's sample data is non-representative of the real data