Here's my problem I couldn't solve it all.
Suppose that we have the following code as follows:
## A data frame named a
a <- data.frame(A = c(0,0,1,1,1), B = c(1,0,1,0,0), C = c(0,0,1,1,0), D = c(0,0,1,1,0), E = c(0,1,1,0,1))
## 1st function calculates all the combinaisons of colnames of a and the output is a character vector named item2
items2 <- c()
countI <- 1
while(countI <= ncol(a)){
for(i in countI){
countJ <- countI + 1
while(countJ <= ncol(a)){
for(j in countJ){
items2 <- c(items2, paste(colnames(a[i]), colnames(a[j]), collapse = '', sep = ""))
}
countJ <- countJ + 1
}
countI <- countI + 1
}
}
And here's my code I'm trying to solve (the output is a numeric vector called count_1):
## 2nd function
colnames(a) <- NULL ## just for facilitating the calculation
count_1 <- numeric(ncol(a)*2)
countI <- 1
while(countI <= ncol(a)){
for(i in countI){
countJ <- countI + 1
while(countJ <= ncol(a)){
for(j in countJ){
s <- a[, i]
p <- a[, j]
count_1[i*2] <- as.integer(s[i] == p[j] & s[i] == 1)
}
countJ <- countJ + 1
}
countI <- countI + 1
}
}
But when I execute this code in RStudio Console, a non-expectation result returned!:
count_1
[1] 0 0 0 0 0 1 0 1 0 0
However, I am expecting the following result:
count_1
[1] 1 2 2 2 1 1 1 1 2 1
You can see visit the following URL where you can find an image on Dropbox for detailed explanation. https://www.dropbox.com/s/5ylt8h8wx3zrvy7/IMAG1074.jpg?dl=0
I'll try to explain a little more,
I posted the 1st function (code) just to show you what I'm looking for exactly that is an example that's all.
What I'm trying to get from the second function (code) is calculating the number of occurrences of number 1 (firstly we put counter = 0
) in each row (while each row of two columns (AB, for example) must equal to one in both columns to say that counter = counter + 1
) we continue by combing each column by all other columns (with AC, AD, AE, BC, BD, BE, CD, CE, and then DE), combination is n!/2!(n-2)!
, that means for example if I have the following data frame:
a =
A B C D E
0 1 0 0 0
0 0 0 0 1
1 1 1 1 1
1 0 0 1 0
1 0 1 0 1
Then, the number of occurrences of the number 1 for each row by combining the two first columns is as follows: (Note that I put colnames(a) <- NULL
just to facilitate the work and be more clear)
0 1 0 0 0
0 0 0 0 1
1 1 1 1 1
1 0 0 1 0
1 0 1 0 1
### Example 1: #####################################################
so from here I put (for columns A and B (AB))
s <- a[, i]
## s is equal to
## [1] 0 0 1 1 1
p <- a[, j]
## p is equal to
## [1] 1 0 1 0 0
Then I'll look for the occurrence of the number 1 in both vectors in condition it must be the same, i.e. a[, i] == 1 && a[, j] == 1 && a[, i] == a[, j]
, and for this example a numeric vector will be [1] 1
### Example 2: #####################################################
From here I put (for columns A and D (AD))
s <- a[, i]
## s is equal to
## [1] 0 0 1 1 1
p <- a[, j]
## p is equal to
## [1] 0 0 1 1 0
Then I'll look for the occurrence of the number 1 in both vectors in condition it must be the same, i.e. a[, i] == 1 && a[, j] == 1 && a[, i] == a[, j]
, and for this example a numeric vector will be [1] 2
And so on,
I'll have a numeric vector named count_1
equal to:
[1] 1 2 2 2 1 1 1 1 2 1
while each index of count_1
is a combination of each column by others (without the names of the data frame)
AB AC AD AE BC BD BE CD CE DE
1 2 2 2 1 1 1 1 2 1
Not clear what you're after at all.
As to the first code chunk, that is some ugly R coding involving a whole bunch of unnecessary while
/for
loops.
You can get the same result items2
in one single line.
items2 <- sort(toupper(unlist(sapply(1:4, function(i)
sapply(5:(i+1), function(j)
paste(letters[i], letters[j], sep = ""))))));
items2;
# [1] "AB" "AC" "AD" "AE" "BC" "BD" "BE" "CD" "CE" "DE"
As to the second code chunk, please explain what you're trying to calculate. It's likely that these while
/for
loops are as unnecessary as in the first case.
Note that this is based on a
as defined at the beginning of your post. Your expected output is based on a different a
, that you changed further down the post.
There is no need for a for
/while
loop, both "functions" can be written in two one-liners.
# Your sample dataframe a
a <- data.frame(A = c(0,0,1,1,1), B = c(1,0,1,0,0), C = c(0,0,1,1,0), D = c(0,0,1,1,0), E = c(0,1,1,0,1))
# Function 1
items2 <- toupper(unlist(sapply(1:(ncol(a) - 1), function(i) sapply(ncol(a):(i+1), function(j)
paste(letters[i], letters[j], sep = "")))));
# Function 2
count_1 <- unlist(sapply(1:(ncol(a) - 1), function(i) sapply(ncol(a):(i+1), function(j)
sum(a[, i] + a[, j] == 2))));
# Add names and sort
names(count_1) <- items2;
count_1 <- count_1[order(names(count_1))];
# Output
count_1;
#AB AC AD AE BC BD BE CD CE DE
# 1 2 2 2 1 1 1 2 1 1