Search code examples
rfor-loopmatrixgraphadjacency-matrix

How can i create a new matrix with the non NA values from another matrix and the row and column they were found in?


I have an n x n matrix named d which is filled with mostly NA values but has some random values spread throughout.

I need to make a new matrix named ds with three columns named head, tail and weight, where weight is the value found in the matrix d, and head and tail are the row and column respectively of d where that particular value for weight was found.

Matrix d:

n = 1000 
d = runif(n*n) 
d[d < 0.80] = NA 
d = matrix(d,nrow=n,ncol=n) #reshape the vector 
diag(d) = NA # no self-loops 
d[upper.tri(d)] = t(d)[upper.tri(d)] # undirected graphs are symmetric

str(d)
num [1:1000, 1:1000] NA NA NA 0.861 NA ...

Desired output of str(ds) and head(ds):

str(ds) 
num [1:99858, 1:3] 1 1 1 1 1 1 1 1 1 1 ... 
- attr(*, "dimnames")=List of 2 
..$ : NULL 
..$ : chr [1:3] "head" "tail" "weight" 

head(ds) 
     head tail    weight 
[1,]    1   15 0.9205357 
[2,]    1   16 0.9938016 
[3,]    1   29 0.9480700

The actual values that return in the matrices above are not important because they will be randomly generated but my final output should look similar.

What I have tried:

head = c()
tail = c()
weight = c()
for (i in 1:n)
  for (j in 1:n)
    if (is.na(d[i][j]) == FALSE)
      head[i] = i
      tail[i] = j
      weight[i] = d[i][j]
ds = cbind(head, tail, weight)

However, this results in the following:

str(ds)
 num [1:1000, 1:3] NA NA 3 NA NA NA NA NA NA NA ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:3] "head" "tail" "weight"

head(ds)
     head tail weight
[1,]   NA   NA     NA
[2,]   NA   NA     NA
[3,]    3   NA     NA
[4,]   NA   NA     NA
[5,]   NA   NA     NA
[6,]   NA   NA     NA

Again, how can I search through matrix d for non-NA values, and when they are found update the matrix ds with the row it was found in, the column it was found in, and the value itself?


Solution

  • We can do this cleanly with two nice R tricks for working with matrices: which(arr.ind = TRUE) and matrix indexing in [. Below I've only modified your setup code to add a seed for reproducibility and to make n 10 so you can see all the elements and convince yourself this is doing what it is supposed to.

    First we use which to return a matrix of the non-NA row and column indexes in the matrix d. With arr.ind = TRUE, we keep one index per dimension of the input instead of flattening to a vector before indexing. Second, we column-bind the indices with with a vector of the corresponding weights obtained by passing that index matrix to [. You can visually compare d and out and see that the values line up.

    set.seed(1)
    n = 10 
    d = runif(n*n) 
    d[d < 0.80] = NA 
    d = matrix(d,nrow=n,ncol=n) #reshape the vector 
    diag(d) = NA # no self-loops 
    d[upper.tri(d)] = t(d)[upper.tri(d)]
    
    indices <- which(!is.na(d), arr.ind = TRUE)
    out <- cbind(indices, d[indices])
    colnames(out) <- c("head", "tail", "weight")
    head(out)
    #>      head tail    weight
    #> [1,]    4    1 0.9082078
    #> [2,]    6    1 0.8983897
    #> [3,]    7    1 0.9446753
    #> [4,]    8    2 0.9919061
    #> [5,]    9    3 0.8696908
    #> [6,]    1    4 0.9082078
    

    Created on 2019-09-24 by the reprex package (v0.3.0)