Search code examples
rloopsmatrixsumifs

Populating Matrix with for loop in R (count and sum)


I already asked a related question here, which was probably not sufficiently specified and thus the answers did not solve my problem. Will try to do it better this time.

I have created dataframe df:

df <- data.frame(names1=c('mouse','dog','cat','cat','mouse','cat','cat','dog','cat','mouse'),                  names2=c('cat','dog','dog','mouse','cat','cat','mouse','mouse','mouse','mouse'),                  values=c(11,5,41,25,101,78,12,41,6,77))

There are purposely no "birds" neither in the names1 nor names2 column.

and the following names vector:

dims <- c('dog','mouse','bird','cat')

and an initially empty matrix:

my_matrix <- matrix(data=0,nrow = length(unique(dims)),ncol = length(unique(dims))) rownames(my_matrix) <- c('dog','mouse','cat', 'bird') colnames(my_matrix) <- c('dog','mouse','cat','bird'))

So the empty matrix look like this:

>        dog mouse bird cat
> dog     0     0    0   0
> mouse   0     0    0   0
> bird    0     0    0   0
> cat     0     0    0   0

**The Goal: **

The goal is to populate the empty matrix with the information coming from dataframe df

  1. Count Matrix

In the first matrix, I want to count the occurrences where the vector elements match. So the first matrix should look exactly like this one (based on df$names1 and df$names2)

>        dog mouse bird cat
> dog     1     1    0   0
> mouse   0     1    0   2
> bird    0     0    0   0
> cat     1     3    0   1
  1. Sum Matrix

In the second matrix, I want to show the sums provided by column df$values and based on the vector match element given by df$names1 and df$names2. So the matrix should look exactly like this one

>        dog mouse bird cat
> dog     5     41    0   0
> mouse   0     77    0   112
> bird    0     0     0   0
> cat     0     137   0   25

Restrictions:

there is one restriction regarding the shape of the matrix - The order of rows and columns is given by the dims vector. The purpose is that the matrix should show that there are no birds in df$names1 and df$names2

My approach:

I tried to place the count and sum into each element of the empty matrix by using a for loop which looks like this:

for(i in 1:nrow(my_matrix)){
  for(j in 1:ncol(my_matrix)){
    my_matrix[i,j] <-  sum(df$values & df$names1[i] == df$names2[j] & df$names1[j] == df$names2[i])  
  }
}

and which provides me with this undesired result

dog mouse bird cat
dog     0     0    0  10
mouse   0    10    0   0
bird    0     0    0   0
cat    10     0    0   0

My intuition tells me that the approach with the for loop is ok but I am not sure how exactly to address each matrix element in the loop and how to define the restrictions within the sum function to get the counts (goal matrix1) and sums (goal matrix2).

Your help will be much appreciated and I am very open to other solutions considering the above-mentioned restrictions (still knowing how to loop through a matrix and assign values to each position would be cool).


Solution

  • To continue your for loop approach - The code you posted only loops through the first 4 elements of the dataframe (e.g. the maximum of nrow() or ncol() of the matrix), and it does not match the names of the matrix columns and rows to the names1 and names2 vectors in the dataframe.

    I'm not sure if the following is exactly what you want, but it does loop through all of the dataframe, relate dataframe and matrix names and count/take sums, so you can modify it as you need.

    my_matrix_count <- my_matrix
    
     for(i in 1:nrow(my_matrix)){
    
       for(j in 1:ncol(my_matrix)){
    
         my_matrix_count[i,j] <-  
           length(df$values[df$names1==row.names(my_matrix)[i] 
                    & df$names2==colnames(my_matrix)[j] ]>0)
    
         my_matrix[i,j] <-  
           sum(df$values[df$names1==row.names(my_matrix)[i] 
                         & df$names2==colnames(my_matrix)[j] ])
      
         }
      }
    

    It's good to know looping and indexing, in my opinion, but yes, it can be done in many better ways, such as the answer by jblood94. For another version, you can also try to write:

      xtabs(values ~names1 + names2, data=df)
    

    Regards, Lars